K-Means Clustering in Machine Learning

The K-Means Clustering is a clustering algorithm capable of clustering an unlabeled dataset quickly and efficiently in just a very few iterations. In this article, I will take you through the K-Means clustering in machine learning using Python.

K-Means Clustering in Machine Learning

Clustering means identifying similar instances and assigning them to clusters or groups of similar instances. It is used in a wide variety of applications such as:

  1. Customer Segmentation
  2. Data Analysis
  3. Dimensionality Reduction
  4. Anomaly Detection
  5. Semi-supervised learning
  6. Searching Images
  7. Image Segmentation

K-Means is a clustering algorithm in machine learning that can group an unlabeled dataset very quickly and efficiently in just a few iterations. It works by labelling all instances on the cluster with the closest centroid. When the instances are centred around a particular point, that point is called a centroid.

Also, Read – 200+ Machine Learning Projects Solved and Explained.

If you receive the instance labels, you can easily locate all items by averaging all instances for each cluster. But here we are not given a label or centroids, so we have to start by placing the centroids randomly by selecting k random instances and using their locations as the centroids.

Then we label the instances, update the centroids, re-label the instances, update the centroids again and so on. The K-Means clustering algorithm is guaranteed to converge in a few iterations, it will not continue to iterate forever.

K-Means Clustering using Python

The computational complexity of the K-Means clustering algorithm is generally linear concerning:

  1. the number of instances m,
  2. the number of clusters k,
  3. and the number of dimensions n.

This is only true when the dataset has a clustering structure if the dataset has no clustering structure, the worst-case time complexity of the algorithm may increase exponentially with the number of instances. In real-time issues, this never happens and K-means clustering is considered to be one of the fastest clustering algorithms.

Now let’s see how to implement K-means clustering using Python. To implement this using Python, I will use the California housing dataset to create economic segments in different areas of California. Let’s start by importing the necessary Python dataset and libraries:

Index(['longitude', 'latitude', 'housing_median_age', 'total_rooms',
       'total_bedrooms', 'population', 'households', 'median_income',
       'median_house_value', 'ocean_proximity'],
      dtype='object')
   median_income  latitude  longitude
0         8.3252     37.88    -122.23
1         8.3014     37.86    -122.22
2         7.2574     37.85    -122.24
3         5.6431     37.85    -122.25
4         3.8462     37.85    -122.25

Now let’s see how to implement the K-means clustering algorithm using Python. Since it is scaled sensitive, it will be a good idea to resize or normalize the data with extreme values:

   median_income  latitude  longitude  Cluster
0         8.3252     37.88    -122.23        2
1         8.3014     37.86    -122.22        2
2         7.2574     37.85    -122.24        2
3         5.6431     37.85    -122.25        2
4         3.8462     37.85    -122.25        0

Now let’s have a look at the clusters identified by the algorithm by using a scatterplot:

k-means clustering

The scatter plot above shows the geographic distribution of the clusters. It appears that the algorithm created separate segments for the high-income area.

Summary

This is how we can implement the K-means clustering algorithm using Python. It is important to scale the input features before running the K-means, otherwise, the clusters can get very stretched and therefore the algorithm will perform poorly. However, scaling the features does not guarantee that the clusters will become nice and spherical, but it usually improves them a lot.

I hope you liked this article on the K-means algorithm in machine learning and its implementation using Python. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1501

Leave a Reply