Customer Segmentation with Machine Learning

In machine learning, customer segmentation is based on the problem of clustering which means finding clusters in a dataset with the same features. Customer segmentation can help a business focus on marketing strategies to increase profits and overall customer satisfaction. In this article, I will walk you through the task of customer segmentation with machine learning using Python.

What is Customer Segmentation?

Customer segmentation is also known as market segmentation. It means dividing customers into groups based on similar functionality. For example, think of a pet store, each customer has a very unique preference when choosing a pet. The type of pet a customer buys is highly dependent on the type of animal they love, the type of lifestyle and income they have and many other factors.

So, according to the example above, the customers of a pet store can be grouped according to the type of animal they prefer and the income they earn. Customer segmentation, therefore, means matching products and offerings with the most suitable customers. Hope you now know what it means to segment customers, in the section below I will walk you through the task of customer segmentation with machine learning using Python.

Customer Segmentation using Python

Customer segmentation is about identifying the most profitable customer and tailoring products and offerings to meet customer needs. Now let’s see how to do the customer segmentation task with machine learning using Python. I’ll start this task by importing the necessary Python libraries and the dataset:

  CUST_ID      BALANCE  ...  PRC_FULL_PAYMENT  TENURE
0  C10001    40.900749  ...          0.000000      12
1  C10002  3202.467416  ...          0.222222      12
2  C10003  2495.148862  ...          0.000000      12
3  C10004  1666.670542  ...          0.000000      12
4  C10005   817.714335  ...          0.000000      12

[5 rows x 18 columns]

The dataset that I am using here is based on the credit card usage of about 9000 active credit cardholders. At the end of this task, you will learn how we can segment the customers based on their purchases and transactions. Now before moving forward let’s have a look at whether there are any missing values in the dataset:

data = data.drop(["CUST_ID"], axis=1)
print(data.isnull().sum())
BALANCE                               0
BALANCE_FREQUENCY                     0
PURCHASES                             0
ONEOFF_PURCHASES                      0
INSTALLMENTS_PURCHASES                0
CASH_ADVANCE                          0
PURCHASES_FREQUENCY                   0
ONEOFF_PURCHASES_FREQUENCY            0
PURCHASES_INSTALLMENTS_FREQUENCY      0
CASH_ADVANCE_FREQUENCY                0
CASH_ADVANCE_TRX                      0
PURCHASES_TRX                         0
CREDIT_LIMIT                          1
PAYMENTS                              0
MINIMUM_PAYMENTS                    313
PRC_FULL_PAYMENT                      0
TENURE                                0
dtype: int64

So as you can see we have missing values in two columns, I will fill the values of these columns by using the mean values:

data["MINIMUM_PAYMENTS"].fillna(data["MINIMUM_PAYMENTS"].mean(skipna=True), inplace=True)
data["CREDIT_LIMIT"].fillna(data["CREDIT_LIMIT"].mean(skipna=True), inplace=True)

KMeans Clustering to Segment the Customers

Now let’s scale the features of the data and then use the KMeans clustering algorithm which is one of the best clustering algorithms in machine learning:

       BALANCE  BALANCE_FREQUENCY  PURCHASES  ...  PRC_FULL_PAYMENT  TENURE  Cluster
0    40.900749           0.818182      95.40  ...          0.000000      12        3
1  3202.467416           0.909091       0.00  ...          0.222222      12        0
2  2495.148862           1.000000     773.17  ...          0.000000      12        1
3  1666.670542           0.636364    1499.00  ...          0.000000      12        3
4   817.714335           1.000000      16.00  ...          0.000000      12        3

[5 rows x 18 columns]

So we have created the clusters and added them to the dataset as new columns named “Cluster”. Now let’s have a look at the customer segments formed by the KMeans algorithm:

Customer Segmentation with KMeans

So this is how we can segment the customers based on their credit card transactions by using the KMeans algorithm. I hope you liked this article on the task of Customer clustering with machine learning using Python. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1435

Leave a Reply