Recommendation Engine using Convolutional Neural Networks, Collaborative Filtering & NLP

recommendation engine

OTT players like Netflix, Spotify, Hulu & Amazon Prime have given the freedom to the consumer. The user can watch movies and listen to songs of their choice at any time and on any device like smartphones, laptops, SmartTVs, and tablets.

Along with content, a big part of an OTT player’s success depends on discovery tools. In simple terms, how efficiently and quickly can the user find movies/songs that they are likely to enjoy? And in other words how precisely can its Recommendation Engine predict your tastes and show you content accordingly?

There are a few key characteristics of a good Recommendation Engine. While it should recommend what content similar users are consuming, it should also ensure that users get new content recommendations similar to their taste even if that content may not have been discovered by many other users yet and hence cannot be captured by a traditional approach. This will become clear below. 

Following is a combination of approaches

Collaborative filtering models

These analyze both your behavior & others’ behavior and have been prominently used by and Netflix. Unlike Netflix, Spotify doesn’t have a star-based rating system. 

It uses implicit feedback like count of tracks, saved songs in playlists, or visiting the artist s page after listening to a track. Once these parameters are established it compares what tracks you like and what any other user like. Once it isn’t established that you have similar tastes, songs from each other’s interests are recommended. 

At a scale for millions of users, this is done with matrix mathematics and python libraries. 

convolutional neural networks

This is a huge matrix with one row for each user and each column represents a song in your catalog. Then the python libraries run a long matrix factorisation formula similar to this:

Once this completes, we get a song vector and a user vector. Collaborative filtering compares each user’s vector and finds the closest matches. 

This is the first step towards a custom recommendation engine. 

Natural Language Processing(NLP)

Next comes Natural Language Processing (NLP) which basically does sentiment analysis of various blogs, news articles, social media, etc. to judge what people are talking about a song.

Then cultural vectors are formed by AI for each artist and song with each vector having its own weight. This gives you a probability of how someone would describe a particular song. 

Raw audio models with convolutional neural networks:

This step is what sets Spotify apart from others as it takes new songs into account. If a track has had 10 listens only there are very few listeners who can collaboratively filter it for you. It will also not get picked by NLP as much less has been written about it over the internet. 

Raw audio models come to the rescue and with this, even a new content piece can get recommended to you even if it has had 10 views. 

Following is a deep learning convolutional neural network architecture example:

recommending music on spotify
Image source: Recommending music on Spotify with deep learning

After processing, the neural network outputs an understanding of the song with characters including key, time signature, mode, tempo, loudness, etc. This helps understand fundamental similarities between content pieces and to recommend to users accordingly. 

We can start creating our own recommendation engine with all three steps one by one.

Talk to us for suggestions at if you are looking to create a Deep Learning based Recommendation Engine.

Subscribe to our newsletter to receive news on Metaverse, Blockchain, AR, VR and more.