Recommendation Engine using Convolutional Neural Networks, Collaborative Filtering & NLP

recommendation engine

OTT players like Netflix, Spotify, Hulu & Amazon Prime have given freedom to the consumer. The user can watch movies and listen to songs of their choice at any time and on any device like smartphones, laptops, SmartTVs and tablets.

Along with content, a big part of OTT player’s success depends on discovery tools. In simple terms, how efficiently and quickly can the user find movies/songs that they are likely to enjoy. And in other words how precisely can its Recommendation Engine predict your tastes and show you content accordingly.

There are a few key characteristics of a good Recommendation Engine. While it should recommend what content similar users are consuming, it should also ensure that users get new content recommendations similar to their taste even if that content may not have been discovered by many other users yet and hence cannot be captured by a traditional approach. This will become clear below. 

Following is a combination of approaches


Collaborative filtering models

These analyze both your behavior & others behavior and have been prominently used by last.fm and Netflix. Unlike Netflix, Spotify doesn’t have a star based rating system. 

It uses implicit feedback like count of tracks, saved songs in playlists or visiting artist s page after listening to a track. Once these parameters are established it compares what tracks you like to and what any other user like. Once it isn’t established that you have similar tastes, songs from each other’s interests are recommended. 

At a scale for millions of users this is done with matrix mathematics and python libraries. 

convolutional neural networks

This is a huge matrix with one row for each user and each column represents a song in your catalog. Then the python libraries run a long matrix factorisation formula similar to this:


Once this completes, we get a song vector and a user vector. Collaborative filtering compares each users vector and finds closest matches. 

This is the first step towards a custom reco engine. 

Natural Language Processing(NLP)

Next comes Natural Language Processing (NLP) which basically does sentiment analysis of various blogs, news article, social media etc. to judge what people are talking about a song.

Then cultural vectors are formed by AI for each artist and song with each vector having its own weight. This gives you a probability of how someone would describe a particular song. 

Raw audio models with convolutional neural networks:

This step is what sets Spotify apart from others as it takes new songs into account. If a track has had 10 listens only there are very few listeners who can collaboratively filter it for you. It will also not get picked by NLP as much less has been written about it over the internet. 

Raw audio models come to rescue and with this even a new content piece can get recommended to you even if it has had 10 views. 

Following is a deep learning convolutional neural network architecture example:

recommending music on spotify
Image source: Recommending music on Spotify with deep learning


After processing, the neural network outputs an understanding of the song with characters including key, time signature, mode, tempo, loudness etc. This helps understand fundamental similarities between content pieces and to recommend to users accordingly. 

We can start creating our own recommendation engine with all three steps one by one.

Talk to us for suggestions at [email protected] if you are looking to create a Deep Learning based Recommendation Engine .

Subscribe to our newsletter to receive news on Metaverse, Blockchain, AR, VR and more.

Newsletter