Spotify is the perfect example of the rise of music streaming services. The success of an app depends a lot on the user experience that the app provides to its users. A recommendation system is what helps a streaming application in providing a good user experience. So we can say that the Spotify recommendation system has played a major role in providing a good user experience which has resulted in such success for Spotify. In this article, I will walk you through how to build the Spotify Recommendation System with machine learning using Python.
Spotify Recommendation System with Machine Learning
In recent years, music and movie streaming services have grown in popularity. Today, Netflix and Spotify have a large number of users, which has made these streaming services successful. A recommendation system plays a major role in providing a good user experience in an application by recommending the most suitable and personalized services for each user. Today, Spotify has 155 million premium subscribers and 345 million active users. Spotify’s recommendation system has also played a major role in Spotify’s success.
The Spotify recommendation system uses collaborative filtering to recommend songs and podcasts to users. Collaborative filtering recommends products or services by finding similarities between users and the products or services to provide a better user experience. In the section below, I’ll walk you through a machine learning project on Spotify Recommendation System using the Python programming language.
Spotify Recommendation System using Python
To create a Spotify recommendation system, I will be using a dataset that has been collected from Spotify. The dataset contains over 175,000 songs with over 19 features grouped by artist, year and genre. I will begin the task of building a music recommendation system with machine learning by importing the necessary Python libraries and dataset:
Data Exploration
Let’s explore some key insights from this dataset so that we can select the best features for creating the recommendation system:
data.info()
RangeIndex: 174389 entries, 0 to 174388 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 acousticness 174389 non-null float64 1 artists 174389 non-null object 2 danceability 174389 non-null float64 3 duration_ms 174389 non-null int64 4 energy 174389 non-null float64 5 explicit 174389 non-null int64 6 id 174389 non-null object 7 instrumentalness 174389 non-null float64 8 key 174389 non-null int64 9 liveness 174389 non-null float64 10 loudness 174389 non-null float64 11 mode 174389 non-null int64 12 name 174389 non-null object 13 popularity 174389 non-null int64 14 release_date 174389 non-null object 15 speechiness 174389 non-null float64 16 tempo 174389 non-null float64 17 valence 174389 non-null float64 18 year 174389 non-null int64 dtypes: float64(9), int64(6), object(4) memory usage: 25.3+ MB
data.isnull().sum()
acousticness 0 artists 0 danceability 0 duration_ms 0 energy 0 explicit 0 id 0 instrumentalness 0 key 0 liveness 0 loudness 0 mode 0 name 0 popularity 0 release_date 0 speechiness 0 tempo 0 valence 0 year 0 dtype: int64
So the dataset does not contain any missing value. Now let’s have a look at the correlation between the feature. Here I will drop some columns such as ‘id’, ‘name’, ‘artists’, ‘release date’, and ‘year’ as these columns do not contribute much to the features of a song:
df = data.drop(columns=['id', 'name', 'artists', 'release_date', 'year']) df.corr()

Data Transformation
Now I will normalize the dataset by using the MinMaxScaler method provided by the Scikit-learn library in Python. Here I will normalize all the numerical columns, for that, I will be selecting all the columns with int and float datatypes:
Songs of different genres may have similar characteristics which may affect the recommendation system. So I’m going to create a new feature here that will differentiate songs from different categories. For this task, I’ll use the K means clustering algorithm:
Spotify Recommendation System
Now we are ready with the data by performing all the necessary transformations to build the recommendation system. So let’s see how we can use the features in the dataset to recommend songs to the users:

So this is how we can easily create a recommendation system for any music streaming application. I hope you liked this article on how to create the Spotify Recommendation System with machine learning using Python. Feel free to ask your valuable questions in the comments section below.