Music Recommendation System using Python

A Music Recommendation System is an application of Data Science that aims to assist users in discovering new and relevant musical content based on their preferences and listening behaviour. Personalized music recommendations have become an essential tool in the digital music landscape, enabling music streaming platforms like Spotify and Apple Music to offer personalized and engaging experiences to their users. If you want to learn how to build a music recommendation system, this article is for you. In this article, I’ll take you through building a Music Recommendation System using Spotify API and Python.

How Does a Music Recommendation System Work?

Music Recommendation Systems operate through intricate algorithms that analyze vast amounts of data about users’ musical interactions, such as their listening history, liked tracks, skipped songs, and even explicit user preferences conveyed through ratings or feedback. These data points are instrumental in constructing comprehensive user profiles, delineating individual tastes and preferences.

In the initial phase, the system employs various data preprocessing techniques to cleanse and organize the information efficiently. Subsequently, the system uses recommendation algorithms, such as collaborative filtering, content-based filtering, and hybrid approaches, to generate music recommendations.

As users continually interact with the system, it accumulates additional data, refining and updating their profiles in real time. Consequently, the recommendations become increasingly precise and aligned with the user’s evolving musical preferences.

What is Spotify API & How to build a Music Recommendation System using Spotify API?

The Spotify API is a set of rules and protocols provided by Spotify developers. It enables developers to interact with Spotify’s vast music catalogue and collect music-related data. Through the Spotify API, developers can access information such as tracks, albums, artists, playlists, user profiles, and play history, among other features, empowering them to build innovative applications and services that integrate seamlessly with the Spotify platform.

To build a Music Recommendation System using the Spotify API, we are required to collect real-time music data from Spotify. For this task, we need a Spotify developer account to get our credentials from Spotify to access their data.

Below is the process you can follow to sign up for the Spotify developer account and get your credentials.

Step 1: Create a Spotify Account

For a Spotify developer account, you need an account at Spotify. If you don’t use Spotify, create an account. You don’t need to purchase any subscription to get your credentials. Once you have created an account at Spotify (or you already have one) log in to your account from your web browser.

Step 2: Go to Your Spotify Developer Dashboard

Once you have created an account at Spotify, you need to log in to your Spotify developer dashboard. Here’s the link to the dashboard. As you will be using this developer account for the first time, sign the agreement and verify your email. After these steps, we can move to the next step.

Step 3: Create An App

Once you have verified an email, you will see an option to create an app in your dashboard, as shown in the image below.

music recommendation system using Spotify API and Python

Click “Create app” and move to the next step.

Step 4: App Description

Fill in the app description, as shown in the image below.

Step 5: Copy Your Client ID and Client Secret

After filling in the app description, you will be redirected to your id and password. If you click “View client secret”, you will see your password. Copy your credentials so that you can use them while building a Music Recommendation System using Python.

You can find a detailed guide to get your credentials here.

Music Recommendation System using Python

I hope you have understood what a Music Recommendation System is. Now, in this section, I’ll take you through building a Music Recommendation System using Spotify API and Python.

To get started with building a Music Recommendation System, we first need to have an access token. The access token serves as a temporary authorization credential, allowing the code to make authenticated requests to the Spotify API on behalf of the application. Below is how we can get it:

import requests
import base64

# Replace with your own Client ID and Client Secret
CLIENT_ID = 'your_client_id'
CLIENT_SECRET = 'your_client_secret'

# Base64 encode the client ID and client secret
client_credentials = f"{CLIENT_ID}:{CLIENT_SECRET}"
client_credentials_base64 = base64.b64encode(client_credentials.encode())

# Request the access token
token_url = 'https://accounts.spotify.com/api/token'
headers = {
    'Authorization': f'Basic {client_credentials_base64.decode()}'
}
data = {
    'grant_type': 'client_credentials'
}
response = requests.post(token_url, data=data, headers=headers)

if response.status_code == 200:
    access_token = response.json()['access_token']
    print("Access token obtained successfully.")
else:
    print("Error obtaining access token.")
    exit()

Access token obtained successfully.

In the above code, The CLIENT_ID and CLIENT_SECRET variables hold my credentials (you need to add your credentials in these variables) that uniquely identify the application making requests to the Spotify API. These credentials are obtained when a developer registers their application with Spotify’s developer dashboard. The Client ID identifies the application, while the Client Secret is a confidential key used for authentication.

The client ID and secret are combined in the client_credentials variable, separated by a colon (:). Then, this string is encoded using Base64 encoding to create a secure representation of the credentials. We then proceed to request an access token from the Spotify API.

It sends a POST request to the token_url (https://accounts.spotify.com/api/token) with the client credentials in the Authorization header, which is required for client authentication. The grant_type parameter is set to ‘client_credentials’ to indicate that the application is requesting an access token for the client credentials flow.

With the access token, the application can now make authorized requests to retrieve music data, such as tracks, albums, artists, and user information, which is fundamental for building a music recommendation system using the Spotify API and Python.

Now, I’ll write a function to get music data from any playlist on Spotify. For this task, you need to install the Spotipy library, which is a Python library providing access to Spotify’s web API. Here’s how to install it on your system by writing the command mentioned below in your command prompt or terminal:

pip install spotipy

Below I am defining a function responsible for collecting music data from any playlist on Spotify using the Spotipy library:

import pandas as pd
import spotipy
from spotipy.oauth2 import SpotifyOAuth

def get_trending_playlist_data(playlist_id, access_token):
    # Set up Spotipy with the access token
    sp = spotipy.Spotify(auth=access_token)

    # Get the tracks from the playlist
    playlist_tracks = sp.playlist_tracks(playlist_id, fields='items(track(id, name, artists, album(id, name)))')

    # Extract relevant information and store in a list of dictionaries
    music_data = []
    for track_info in playlist_tracks['items']:
        track = track_info['track']
        track_name = track['name']
        artists = ', '.join([artist['name'] for artist in track['artists']])
        album_name = track['album']['name']
        album_id = track['album']['id']
        track_id = track['id']

        # Get audio features for the track
        audio_features = sp.audio_features(track_id)[0] if track_id != 'Not available' else None

        # Get release date of the album
        try:
            album_info = sp.album(album_id) if album_id != 'Not available' else None
            release_date = album_info['release_date'] if album_info else None
        except:
            release_date = None

        # Get popularity of the track
        try:
            track_info = sp.track(track_id) if track_id != 'Not available' else None
            popularity = track_info['popularity'] if track_info else None
        except:
            popularity = None

        # Add additional track information to the track data
        track_data = {
            'Track Name': track_name,
            'Artists': artists,
            'Album Name': album_name,
            'Album ID': album_id,
            'Track ID': track_id,
            'Popularity': popularity,
            'Release Date': release_date,
            'Duration (ms)': audio_features['duration_ms'] if audio_features else None,
            'Explicit': track_info.get('explicit', None),
            'External URLs': track_info.get('external_urls', {}).get('spotify', None),
            'Danceability': audio_features['danceability'] if audio_features else None,
            'Energy': audio_features['energy'] if audio_features else None,
            'Key': audio_features['key'] if audio_features else None,
            'Loudness': audio_features['loudness'] if audio_features else None,
            'Mode': audio_features['mode'] if audio_features else None,
            'Speechiness': audio_features['speechiness'] if audio_features else None,
            'Acousticness': audio_features['acousticness'] if audio_features else None,
            'Instrumentalness': audio_features['instrumentalness'] if audio_features else None,
            'Liveness': audio_features['liveness'] if audio_features else None,
            'Valence': audio_features['valence'] if audio_features else None,
            'Tempo': audio_features['tempo'] if audio_features else None,
            # Add more attributes as needed
        }

        music_data.append(track_data)

    # Create a pandas DataFrame from the list of dictionaries
    df = pd.DataFrame(music_data)

    return df

The function begins by initializing the Spotipy client with the provided access_token, which serves as the authentication token to interact with the Spotify Web API. The access_token allows the function to make authorized requests to access Spotify’s resources. The function then uses the Spotipy client to fetch information about the tracks in the specified playlist (identified by its playlist_id). The sp.playlist_tracks method retrieves the playlist tracks. The fields parameter is used to specify the specific track information that is required, such as track ID, name, artists, album ID, and album name.

The function then extracts relevant information from the retrieved playlist tracks and stores it in a list of dictionaries called music_data. For each track in the playlist, the function extracts data such as track name, artists (combined into a single string), album name, album ID, track ID, and popularity. The function uses the sp.audio_features method to fetch audio features for each track in the playlist. These audio features include attributes like danceability, energy, key, loudness, speechiness, acousticness, instrumentalness, liveness, valence, tempo, etc. These audio features provide insights into the characteristics of each track.

The extracted information for all tracks is stored in the music_data list. The function then creates a DataFrame from the music_data list. The DataFrame organizes the music data in a tabular format, making it easier to analyze and work with the collected information.

Now, here’s how we can use the function to collect music data from any playlist on Spotify:

playlist_id = '37i9dQZF1DX76Wlfdnj7AP'

# Call the function to get the music data from the playlist and store it in a DataFrame
music_df = get_trending_playlist_data(playlist_id, access_token)

# Display the DataFrame
print(music_df)

                               Track Name  \
0                         I'm Good (Blue)   
1                      Boy's a Liar Pt. 2   
2   Quevedo: Bzrp Music Sessions, Vol. 52   
3                         Me Porto Bonito   
4                             El Merengue   
..                                    ...   
95                       PLAYA DEL INGLÉS   
96                   Lionheart (Fearless)   
97      family ties (with Kendrick Lamar)   
98                   Marianela (Que Pasa)   
99              Levitating (feat. DaBaby)   

                                     Artists  \
0                   David Guetta, Bebe Rexha   
1                  PinkPantheress, Ice Spice   
2                          Bizarrap, Quevedo   
3                Bad Bunny, Chencho Corleone   
4                  Marshmello, Manuel Turizo   
..                                       ...   
95                      Quevedo, Myke Towers   
96                   Joel Corry, Tom Grennan   
97                 Baby Keem, Kendrick Lamar   
98  HUGEL, Merk & Kremont, Lirico En La Casa   
99                          Dua Lipa, DaBaby   

                               Album Name                Album ID  \
0                         I'm Good (Blue)  7M842DMhYVALrXsw3ty7B3   
1                      Boy's a liar Pt. 2  6cVfHBcp3AdpYY0bBglkLN   
2   Quevedo: Bzrp Music Sessions, Vol. 52  4PNqWiJAfjj32hVvlchV5u   
3                        Un Verano Sin Ti  3RQQmkQEvNCY4prGKE6oc5   
4                             El Merengue  6sU751LOdNBPvVErW1GunP   
..                                    ...                     ...   
95                       PLAYA DEL INGLÉS  1MgW79L1nRyxWHOCu4nxR9   
96                   Lionheart (Fearless)  68U7caniDmdQHifJdnlYFT   
97      family ties (with Kendrick Lamar)  3HqmX8hGcbbQZODgayNEYx   
98                   Marianela (Que Pasa)  5As1VmPUMn4HIgYSbFD6l0   
99              Levitating (feat. DaBaby)  04m06KhJUuwe1Q487puIud   

                  Track ID  Popularity Release Date  Duration (ms)  Explicit  \
0   4uUG5RXrOk84mYEfFvj3cK          95   2022-08-26         175238      True   
1   6AQbmUe0Qwf5PZnt4HmTXv          94   2023-02-03         131013     False   
2   2tTmW7RDtMQtBk7m2rYeSw          93   2022-07-06         198938     False   
3   6Sq7ltF9Qa7SNFBsV5Cogx          92   2022-05-06         178567      True   
4   51FvjPEGKq2zByeeEQ43V9          92   2023-03-03         189357     False   
..                     ...         ...          ...            ...       ...   
95  2t6IxTASaSFkZEt61tQ6W6          78   2022-12-15         237525     False   
96  5vlzH0ps6WDyb158oFTAb3          77   2022-10-21         186689     False   
97  7Bpx2vsWfQFBACRz4h3IqH          77   2021-08-27         252070      True   
98  5bZjb7xKqLqa58QiUBcVvl          77   2022-11-25         145766     False   
99  463CkQjx2Zk1yXoBuierM9          77   2020-10-01         203064     False   

                                        External URLs  ...  Energy  Key  \
0   https://open.spotify.com/track/4uUG5RXrOk84mYE...  ...   0.965    7   
1   https://open.spotify.com/track/6AQbmUe0Qwf5PZn...  ...   0.809    5   
2   https://open.spotify.com/track/2tTmW7RDtMQtBk7...  ...   0.782    2   
3   https://open.spotify.com/track/6Sq7ltF9Qa7SNFB...  ...   0.712    1   
4   https://open.spotify.com/track/51FvjPEGKq2zBye...  ...   0.677    8   
..                                                ...  ...     ...  ...   
95  https://open.spotify.com/track/2t6IxTASaSFkZEt...  ...   0.736    7   
96  https://open.spotify.com/track/5vlzH0ps6WDyb15...  ...   0.967    8   
97  https://open.spotify.com/track/7Bpx2vsWfQFBACR...  ...   0.611    1   
98  https://open.spotify.com/track/5bZjb7xKqLqa58Q...  ...   0.893    1   
99  https://open.spotify.com/track/463CkQjx2Zk1yXo...  ...   0.825    6   

    Loudness  Mode  Speechiness  Acousticness  Instrumentalness  Liveness  \
0     -3.673     0       0.0343       0.00383          0.000007    0.3710   
1     -8.254     1       0.0500       0.25200          0.000128    0.2480   
2     -5.548     1       0.0440       0.01250          0.033000    0.2300   
3     -5.105     0       0.0817       0.09010          0.000027    0.0933   
4     -4.703     0       0.0442       0.03130          0.005170    0.1120   
..       ...   ...          ...           ...               ...       ...   
95    -3.254     0       0.0469       0.08220          0.000000    0.1090   
96    -2.430     1       0.0538       0.02170          0.001570    0.3360   
97    -5.453     1       0.3290       0.00575          0.000000    0.2310   
98    -3.344     1       0.0496       0.03300          0.005350    0.0811   
99    -3.787     0       0.0601       0.00883          0.000000    0.0674   

    Valence    Tempo  
0     0.304  128.040  
1     0.857  132.962  
2     0.550  128.033  
3     0.425   92.005  
4     0.698  124.011  
..      ...      ...  
95    0.656  112.993  
96    0.349  125.982  
97    0.144  134.140  
98    0.602  124.043  
99    0.915  102.977  

[100 rows x 21 columns]

In this code snippet, we used a playlist ID: “37i9dQZF1DX76Wlfdnj7AP”. The code then calls the get_trending_playlist_data function to extract music data from the specified playlist using the provided access_token. The collected music data is stored in a DataFrame named music_df. Finally, the code prints the DataFrame to display the extracted music data.

You can also add your playlist id here. If your playlist link is (https://open.spotify.com/playlist/37i9dQZF1DX76Wlfdnj7AP), the playlist ID is “37i9dQZF1DX76Wlfdnj7AP”, which is what you would replace with my playlist id within the above code snippet.

Now let’s check if the data has any null values or not:

print(music_df.isnull().sum())

Track Name          0
Artists             0
Album Name          0
Album ID            0
Track ID            0
Popularity          0
Release Date        0
Duration (ms)       0
Explicit            0
External URLs       0
Danceability        0
Energy              0
Key                 0
Loudness            0
Mode                0
Speechiness         0
Acousticness        0
Instrumentalness    0
Liveness            0
Valence             0
Tempo               0
dtype: int64

Now, let’s move further to building a music recommendation system using Python. Let’s import the necessary Python libraries now:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from datetime import datetime
from sklearn.metrics.pairwise import cosine_similarity

data = music_df

While providing music recommendations to users, it is important to recommend the latest releases. For this, we need to give more weight to the latest releases in the recommendations. Let’s write a function to solve this problem:

# Function to calculate weighted popularity scores based on release date
def calculate_weighted_popularity(release_date):
    # Convert the release date to datetime object
    release_date = datetime.strptime(release_date, '%Y-%m-%d')

    # Calculate the time span between release date and today's date
    time_span = datetime.now() - release_date

    # Calculate the weighted popularity score based on time span (e.g., more recent releases have higher weight)
    weight = 1 / (time_span.days + 1)
    return weight

The above function takes the release date of a music track as input, which is provided in the format ‘YYYY-MM-DD’. It then uses the datetime.strptime function from the Python datetime module to convert the release date string to a datetime object. This conversion allows us to perform arithmetic operations with dates. The function then calculates the time span between the release date of the track and the current date (today’s date) using datetime.now() – release_date. This results in a timedelta object representing the time difference between the two dates.

The weighted popularity score is computed based on the time span. The formula to calculate the weight is 1 / (time_span.days + 1). The time_span.days attribute of the timedelta object gives the number of days in the time span between the release date and today. Adding 1 to the number of days ensures that the weight is never zero, even for very recent releases, as this would lead to a division by zero error.

The idea behind this formula is that the weight decreases as the time span between the release date and today increases. More recent releases will have a higher weight, while older releases will have a lower weight. As a result, when combining this weighted popularity score with other factors in a recommendation system, recent tracks will have a more significant impact on the final recommendations, reflecting users’ potential interest in newer music.

Now let’s normalize the music features before moving forward:

# Normalize the music features using Min-Max scaling
scaler = MinMaxScaler()
music_features = music_df[['Danceability', 'Energy', 'Key', 
                           'Loudness', 'Mode', 'Speechiness', 'Acousticness',
                           'Instrumentalness', 'Liveness', 'Valence', 'Tempo']].values
music_features_scaled = scaler.fit_transform(music_features)

We will create a hybrid recommendation system for music recommendations. The first approach will be based on recommending music based on music audio features, and the second approach will be based on recommending music based on weighted popularity.

Here’s how to generate music recommendations based on the music audio features:

# a function to get content-based recommendations based on music features
def content_based_recommendations(input_song_name, num_recommendations=5):
    if input_song_name not in music_df['Track Name'].values:
        print(f"'{input_song_name}' not found in the dataset. Please enter a valid song name.")
        return

    # Get the index of the input song in the music DataFrame
    input_song_index = music_df[music_df['Track Name'] == input_song_name].index[0]

    # Calculate the similarity scores based on music features (cosine similarity)
    similarity_scores = cosine_similarity([music_features_scaled[input_song_index]], music_features_scaled)

    # Get the indices of the most similar songs
    similar_song_indices = similarity_scores.argsort()[0][::-1][1:num_recommendations + 1]

    # Get the names of the most similar songs based on content-based filtering
    content_based_recommendations = music_df.iloc[similar_song_indices][['Track Name', 'Artists', 'Album Name', 'Release Date', 'Popularity']]

    return content_based_recommendations

The above function takes input_song_name as the input, which represents the name of the song for which recommendations are to be generated. The function checks if the input_song_name exists in the music_df DataFrame, which presumably contains the music data with features like ‘Track Name’, ‘Artists’, ‘Album Name’, ‘Release Date’, and ‘Popularity’. If the input song name is found in the music_df DataFrame, the function retrieves the index of the input song in the DataFrame. This index will be used to compare the audio features of the input song with other songs in the dataset.

The function calculates the similarity scores between the audio features of the input song and all other songs in the dataset. It uses cosine similarity, a common measure used in content-based filtering. The cosine_similarity function from scikit-learn is employed to compute these similarity scores.

The function identifies the num_recommendations most similar songs to the input song based on their audio features. It does this by sorting the similarity scores in descending order and selecting the top num_recommendations songs. The input song itself is excluded from the recommendations (hence the [1:num_recommendations + 1] slicing). The function then extracts the details (such as track name, artists, album name, release date, and popularity) of the most similar songs from the music_df DataFrame using the indices of the most similar songs.

Now here’s the function to generate music recommendations based on weighted popularity and combine it with the recommendations of the content-based filtering method using the hybrid approach:

# a function to get hybrid recommendations based on weighted popularity
def hybrid_recommendations(input_song_name, num_recommendations=5, alpha=0.5):
    if input_song_name not in music_df['Track Name'].values:
        print(f"'{input_song_name}' not found in the dataset. Please enter a valid song name.")
        return

    # Get content-based recommendations
    content_based_rec = content_based_recommendations(input_song_name, num_recommendations)

    # Get the popularity score of the input song
    popularity_score = music_df.loc[music_df['Track Name'] == input_song_name, 'Popularity'].values[0]

    # Calculate the weighted popularity score
    weighted_popularity_score = popularity_score * calculate_weighted_popularity(music_df.loc[music_df['Track Name'] == input_song_name, 'Release Date'].values[0])

    # Combine content-based and popularity-based recommendations based on weighted popularity
    hybrid_recommendations = content_based_rec
    hybrid_recommendations = hybrid_recommendations.append({
        'Track Name': input_song_name,
        'Artists': music_df.loc[music_df['Track Name'] == input_song_name, 'Artists'].values[0],
        'Album Name': music_df.loc[music_df['Track Name'] == input_song_name, 'Album Name'].values[0],
        'Release Date': music_df.loc[music_df['Track Name'] == input_song_name, 'Release Date'].values[0],
        'Popularity': weighted_popularity_score
    }, ignore_index=True)

    # Sort the hybrid recommendations based on weighted popularity score
    hybrid_recommendations = hybrid_recommendations.sort_values(by='Popularity', ascending=False)

    # Remove the input song from the recommendations
    hybrid_recommendations = hybrid_recommendations[hybrid_recommendations['Track Name'] != input_song_name]


    return hybrid_recommendations

The hybrid approach aims to provide more personalized and relevant recommendations by considering both the content similarity of songs and their weighted popularity. The function takes input_song_name as the input, representing the name of the song for which recommendations are to be generated. The function first calls the content_based_recommendations function to get content-based recommendations for the input song. The num_recommendations parameter determines the number of content-based recommendations to be retrieved.

The function calculates the popularity score of the input song by retrieving the popularity value from the music_df DataFrame. It also calculates the weighted popularity score using the calculate_weighted_popularity function (previously defined) based on the release date of the input song. The alpha parameter controls the relative importance of content-based and popularity-based recommendations.

The content-based recommendations obtained earlier are stored in the content_based_rec DataFrame. The function combines the content-based recommendations with the input song’s information (track name, artists, album name, release date, and popularity) and its weighted popularity score. This step creates a DataFrame named hybrid_recommendations that includes both the content-based recommendations and the input song’s data.

The hybrid_recommendations DataFrame is then sorted in descending order based on the weighted popularity score. This step ensures that the most popular and relevant songs appear at the top of the recommendations. The input song is then removed from the recommendations to avoid suggesting the same song as part of the recommendations.

Now here’s how we can test the final function to generate music recommendations:

input_song_name = "I'm Good (Blue)"
recommendations = hybrid_recommendations(input_song_name, num_recommendations=5)
print(f"Hybrid recommended songs for '{input_song_name}':")
print(recommendations)

Hybrid recommended songs for 'I'm Good (Blue)':
                       Track Name                                     Artists  \
0                           REACT  Switch Disco, Ella Henderson, Robert Miles   
2                    Call It Love                     Felix Jaehn, Ray Dalton   
4  Where Did You Go? (feat. MNEK)                             Jax Jones, MNEK   
1                   Where You Are                          John Summit, Hayla   
3           Rainfall (Praise You)                                   Tom Santa   

                      Album Name Release Date  Popularity  
0                          REACT   2023-01-13        84.0  
2                   Call It Love   2022-09-16        84.0  
4  Where Did You Go (feat. MNEK)   2022-02-04        81.0  
1                  Where You Are   2023-03-03        80.0  
3          Rainfall (Praise You)   2022-02-18        78.0

So this is how you can create a Music Recommendation System using Spotify API and Python.

Summary

So, I hope you liked this article on building a Music Recommendation System using the Spotify API and Python. A Music Recommendation System is an application of Data Science that aims to assist users in discovering new and relevant musical content based on their preferences and listening behaviour. I hope this article will help you understand the use of APIs to collect real-time data and use it in action. Feel free to ask valuable questions in the comments section below. If you face any error in signing up for a Spotify developer account, feel free to reach me on LinkedIn or Instagram.