In machine learning, customer segmentation is based on the problem of clustering which means finding clusters in a dataset with the same features. Customer segmentation can help a business focus on marketing strategies to increase profits and overall customer satisfaction. In this article, I will walk you through the task of customer segmentation with machine learning using Python.
What is Customer Segmentation?
Customer segmentation is also known as market segmentation. It means dividing customers into groups based on similar functionality. For example, think of a pet store, each customer has a very unique preference when choosing a pet. The type of pet a customer buys is highly dependent on the type of animal they love, the type of lifestyle and income they have and many other factors.
So, according to the example above, the customers of a pet store can be grouped according to the type of animal they prefer and the income they earn. Customer segmentation, therefore, means matching products and offerings with the most suitable customers. Hope you now know what it means to segment customers, in the section below I will walk you through the task of customer segmentation with machine learning using Python.
Customer Segmentation using Python
Customer segmentation is about identifying the most profitable customer and tailoring products and offerings to meet customer needs. Now let’s see how to do the customer segmentation task with machine learning using Python. I’ll start this task by importing the necessary Python libraries and the dataset:
CUST_ID BALANCE ... PRC_FULL_PAYMENT TENURE 0 C10001 40.900749 ... 0.000000 12 1 C10002 3202.467416 ... 0.222222 12 2 C10003 2495.148862 ... 0.000000 12 3 C10004 1666.670542 ... 0.000000 12 4 C10005 817.714335 ... 0.000000 12 [5 rows x 18 columns]
The dataset that I am using here is based on the credit card usage of about 9000 active credit cardholders. At the end of this task, you will learn how we can segment the customers based on their purchases and transactions. Now before moving forward let’s have a look at whether there are any missing values in the dataset:
data = data.drop(["CUST_ID"], axis=1) print(data.isnull().sum())
BALANCE 0 BALANCE_FREQUENCY 0 PURCHASES 0 ONEOFF_PURCHASES 0 INSTALLMENTS_PURCHASES 0 CASH_ADVANCE 0 PURCHASES_FREQUENCY 0 ONEOFF_PURCHASES_FREQUENCY 0 PURCHASES_INSTALLMENTS_FREQUENCY 0 CASH_ADVANCE_FREQUENCY 0 CASH_ADVANCE_TRX 0 PURCHASES_TRX 0 CREDIT_LIMIT 1 PAYMENTS 0 MINIMUM_PAYMENTS 313 PRC_FULL_PAYMENT 0 TENURE 0 dtype: int64
So as you can see we have missing values in two columns, I will fill the values of these columns by using the mean values:
data["MINIMUM_PAYMENTS"].fillna(data["MINIMUM_PAYMENTS"].mean(skipna=True), inplace=True) data["CREDIT_LIMIT"].fillna(data["CREDIT_LIMIT"].mean(skipna=True), inplace=True)
KMeans Clustering to Segment the Customers
Now let’s scale the features of the data and then use the KMeans clustering algorithm which is one of the best clustering algorithms in machine learning:
BALANCE BALANCE_FREQUENCY PURCHASES ... PRC_FULL_PAYMENT TENURE Cluster 0 40.900749 0.818182 95.40 ... 0.000000 12 3 1 3202.467416 0.909091 0.00 ... 0.222222 12 0 2 2495.148862 1.000000 773.17 ... 0.000000 12 1 3 1666.670542 0.636364 1499.00 ... 0.000000 12 3 4 817.714335 1.000000 16.00 ... 0.000000 12 3 [5 rows x 18 columns]
So we have created the clusters and added them to the dataset as new columns named “Cluster”. Now let’s have a look at the customer segments formed by the KMeans algorithm:

So this is how we can segment the customers based on their credit card transactions by using the KMeans algorithm. I hope you liked this article on the task of Customer clustering with machine learning using Python. Feel free to ask your valuable questions in the comments section below.