Spotify Recommendation System with Machine Learning

Spotify is the perfect example of the rise of music streaming services. The success of an app depends a lot on the user experience that the app provides to its users. A recommendation system is what helps a streaming application in providing a good user experience. So we can say that the Spotify recommendation system has played a major role in providing a good user experience which has resulted in such success for Spotify. In this article, I will walk you through how to build the Spotify Recommendation System with machine learning using Python.

Spotify Recommendation System with Machine Learning

In recent years, music and movie streaming services have grown in popularity. Today, Netflix and Spotify have a large number of users, which has made these streaming services successful. A recommendation system plays a major role in providing a good user experience in an application by recommending the most suitable and personalized services for each user. Today, Spotify has 155 million premium subscribers and 345 million active users. Spotify’s recommendation system has also played a major role in Spotify’s success.

The Spotify recommendation system uses collaborative filtering to recommend songs and podcasts to users. Collaborative filtering recommends products or services by finding similarities between users and the products or services to provide a better user experience. In the section below, I’ll walk you through a machine learning project on Spotify Recommendation System using the Python programming language.

Spotify Recommendation System using Python

To create a Spotify recommendation system, I will be using a dataset that has been collected from Spotify. The dataset contains over 175,000 songs with over 19 features grouped by artist, year and genre. I will begin the task of building a music recommendation system with machine learning by importing the necessary Python libraries and dataset:

Data Exploration

Let’s explore some key insights from this dataset so that we can select the best features for creating the recommendation system:
 RangeIndex: 174389 entries, 0 to 174388
 Data columns (total 19 columns):
  #   Column            Non-Null Count   Dtype  
 ---  ------            --------------   -----  
  0   acousticness      174389 non-null  float64
  1   artists           174389 non-null  object 
  2   danceability      174389 non-null  float64
  3   duration_ms       174389 non-null  int64  
  4   energy            174389 non-null  float64
  5   explicit          174389 non-null  int64  
  6   id                174389 non-null  object 
  7   instrumentalness  174389 non-null  float64
  8   key               174389 non-null  int64  
  9   liveness          174389 non-null  float64
  10  loudness          174389 non-null  float64
  11  mode              174389 non-null  int64  
  12  name              174389 non-null  object 
  13  popularity        174389 non-null  int64  
  14  release_date      174389 non-null  object 
  15  speechiness       174389 non-null  float64
  16  tempo             174389 non-null  float64
  17  valence           174389 non-null  float64
  18  year              174389 non-null  int64  
 dtypes: float64(9), int64(6), object(4)
 memory usage: 25.3+ MB
acousticness        0
artists             0
danceability        0
duration_ms         0
energy              0
explicit            0
id                  0
instrumentalness    0
key                 0
liveness            0
loudness            0
mode                0
name                0
popularity          0
release_date        0
speechiness         0
tempo               0
valence             0
year                0
dtype: int64

So the dataset does not contain any missing value. Now let’s have a look at the correlation between the feature. Here I will drop some columns such as ‘id’, ‘name’, ‘artists’, ‘release date’, and ‘year’ as these columns do not contribute much to the features of a song: 

df = data.drop(columns=['id', 'name', 'artists', 'release_date', 'year'])
correlation between features in the Spotify dataset

Data Transformation

Now I will normalize the dataset by using the MinMaxScaler method provided by the Scikit-learn library in Python. Here I will normalize all the numerical columns, for that, I will be selecting all the columns with int and float datatypes:

Songs of different genres may have similar characteristics which may affect the recommendation system. So I’m going to create a new feature here that will differentiate songs from different categories. For this task, I’ll use the K means clustering algorithm:

Spotify Recommendation System

Now we are ready with the data by performing all the necessary transformations to build the recommendation system. So let’s see how we can use the features in the dataset to recommend songs to the users:

Spotify recommendation system

So this is how we can easily create a recommendation system for any music streaming application. I hope you liked this article on how to create the Spotify Recommendation System with machine learning using Python. Feel free to ask your valuable questions in the comments section below.

Articles: 75

Leave a Reply