Clustering is a machine learning technique to group data points characterized by specific features. Clustering music genres is a task of grouping music based on the similarities in their audio characteristics. If you want to learn how to perform clustering analysis on music genres, this article is for you. In this article, I will take you through the task of clustering music genres with machine learning using Python.
Clustering Music Genres (Problem Statement)
Every person has a different taste in music. We cannot identify what kind of music does a person likes by just knowing about their lifestyle, hobbies, or profession. So it is difficult for music streaming applications to recommend music to a person. But if we know what kind of songs a person listens to daily, we can find similarities in all the music files and recommend similar music to the person.
That is where the cluster analysis of music genres comes in. Here you are given a dataset of popular songs on Spotify, which contains artists and music names with all audio characteristics of each music. Your goal is to group music genres based on similarities in their audio characteristics.
You can download the dataset from here.
Clustering Music Genres using Python
I hope you have understood the problem statement mentioned above on clustering music genres with machine learning. Now let’s start with this task by importing the necessary Python libraries and the dataset:
import pandas as pd import numpy as np from sklearn import cluster data = pd.read_csv("Spotify-2000.csv") print(data.head())
Index Title Artist Top Genre \ 0 1 Sunrise Norah Jones adult standards 1 2 Black Night Deep Purple album rock 2 3 Clint Eastwood Gorillaz alternative hip hop 3 4 The Pretender Foo Fighters alternative metal 4 5 Waitin' On A Sunny Day Bruce Springsteen classic rock Year Beats Per Minute (BPM) Energy Danceability Loudness (dB) \ 0 2004 157 30 53 -14 1 2000 135 79 50 -11 2 2001 168 69 66 -9 3 2007 173 96 43 -4 4 2002 106 82 58 -5 Liveness Valence Length (Duration) Acousticness Speechiness Popularity 0 11 68 201 94 3 71 1 17 81 207 17 7 39 2 7 52 341 2 17 69 3 3 37 269 0 4 76 4 10 87 256 1 3 59
You can see all the columns of the dataset in the above output. It contains all the audio features of music that are enough to find similarities. Before moving forward, I will drop the index column, as it is of no use:
data = data.drop("Index", axis=1)
Now let’s have a look at the correlation between all the audio features in the dataset:
print(data.corr())
Year Beats Per Minute (BPM) Energy \ Year 1.000000 0.012570 0.147235 Beats Per Minute (BPM) 0.012570 1.000000 0.156644 Energy 0.147235 0.156644 1.000000 Danceability 0.077493 -0.140602 0.139616 Loudness (dB) 0.343764 0.092927 0.735711 Liveness 0.019017 0.016256 0.174118 Valence -0.166163 0.059653 0.405175 Acousticness -0.132946 -0.122472 -0.665156 Speechiness 0.054097 0.085598 0.205865 Popularity -0.158962 -0.003181 0.103393 Danceability Loudness (dB) Liveness Valence \ Year 0.077493 0.343764 0.019017 -0.166163 Beats Per Minute (BPM) -0.140602 0.092927 0.016256 0.059653 Energy 0.139616 0.735711 0.174118 0.405175 Danceability 1.000000 0.044235 -0.103063 0.514564 Loudness (dB) 0.044235 1.000000 0.098257 0.147041 Liveness -0.103063 0.098257 1.000000 0.050667 Valence 0.514564 0.147041 0.050667 1.000000 Acousticness -0.135769 -0.451635 -0.046206 -0.239729 Speechiness 0.125229 0.125090 0.092594 0.107102 Popularity 0.144344 0.165527 -0.111978 0.095911 Acousticness Speechiness Popularity Year -0.132946 0.054097 -0.158962 Beats Per Minute (BPM) -0.122472 0.085598 -0.003181 Energy -0.665156 0.205865 0.103393 Danceability -0.135769 0.125229 0.144344 Loudness (dB) -0.451635 0.125090 0.165527 Liveness -0.046206 0.092594 -0.111978 Valence -0.239729 0.107102 0.095911 Acousticness 1.000000 -0.098256 -0.087604 Speechiness -0.098256 1.000000 0.111689 Popularity -0.087604 0.111689 1.000000
Clustering Analysis of Audio Features
Now I will use the K-means clustering algorithm to find the similarities between all the audio features. Then I will add clusters in the dataset based on the similarities we found. So let’s create a new dataset of all the audio characteristics and perform clustering analysis using the K-means clustering algorithm:
data2 = data[["Beats Per Minute (BPM)", "Loudness (dB)", "Liveness", "Valence", "Acousticness", "Speechiness"]] from sklearn.preprocessing import MinMaxScaler for i in data.columns: MinMaxScaler(i) from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=10) clusters = kmeans.fit_predict(data2)
Now I will add the clusters as predicted by the K-means clustering algorithm to the original dataset:
data["Music Segments"] = clusters MinMaxScaler(data["Music Segments"]) data["Music Segments"] = data["Music Segments"].map({1: "Cluster 1", 2: "Cluster 2", 3: "Cluster 3", 4: "Cluster 4", 5: "Cluster 5", 6: "Cluster 6", 7: "Cluster 7", 8: "Cluster 8", 9: "Cluster 9", 10: "Cluster 10"})
Now let’s have a look at the dataset with clusters:
print(data.head())
Title Artist Top Genre Year \ 0 Sunrise Norah Jones adult standards 2004 1 Black Night Deep Purple album rock 2000 2 Clint Eastwood Gorillaz alternative hip hop 2001 3 The Pretender Foo Fighters alternative metal 2007 4 Waitin' On A Sunny Day Bruce Springsteen classic rock 2002 Beats Per Minute (BPM) Energy Danceability Loudness (dB) Liveness \ 0 157 30 53 -14 11 1 135 79 50 -11 17 2 168 69 66 -9 7 3 173 96 43 -4 3 4 106 82 58 -5 10 Valence Length (Duration) Acousticness Speechiness Popularity \ 0 68 201 94 3 71 1 81 207 17 7 39 2 52 341 2 17 69 3 37 269 0 4 76 4 87 256 1 3 59 Music Segments 0 Cluster 1 1 Cluster 6 2 Cluster 2 3 Cluster 2 4 Cluster 3
Now let’s visualize the clusters based on some of the audio features:
import plotly.graph_objects as go PLOT = go.Figure() for i in list(data["Music Segments"].unique()): PLOT.add_trace(go.Scatter3d(x = data[data["Music Segments"]== i]['Beats Per Minute (BPM)'], y = data[data["Music Segments"] == i]['Energy'], z = data[data["Music Segments"] == i]['Danceability'], mode = 'markers',marker_size = 6, marker_line_width = 1, name = str(i))) PLOT.update_traces(hovertemplate='Beats Per Minute (BPM): %{x} <br>Energy: %{y} <br>Danceability: %{z}') PLOT.update_layout(width = 800, height = 800, autosize = True, showlegend = True, scene = dict(xaxis=dict(title = 'Beats Per Minute (BPM)', titlefont_color = 'black'), yaxis=dict(title = 'Energy', titlefont_color = 'black'), zaxis=dict(title = 'Danceability', titlefont_color = 'black')), font = dict(family = "Gilroy", color = 'black', size = 12))

So this is how we can perform cluster analysis of music genres with machine learning.
Summary
So this is how you can perform cluster analysis of music genres with machine learning using Python. Clustering music genres is a task of grouping music based on the similarities in their audio features. I hope you liked this article on clustering music genres with machine learning. Feel free to ask valuable questions in the comments section below.