Customer Segmentation

If you want to find out who are your best customers, using an old technique RFM matrix principle is still the best in the business. RFM means – Recency, Frequency and Monetary. RFM is basically a customer segmentation technique that works by using the past behaviour of your customers based on their purchases to split the customers into groups.

Let’s See How RFM Works:

As RFM is the technique behind customer segmentation, let’s see how it manages to do that. This technique is based on three main categories, they are:

  • Recency(R) – Days since last purchase
  • Frequency(F) – Total number of purchases
  • Monetary Value(M) – How much total money the customer spent.

Customer Segmentation with Python

Now, let’s proceed with the target of this article, which is to create a customer segmentation system with python. Now as I will use the RFM technique here, so the first thing we need to proceed is data because this technique is all dependent on data of customers expenditure on our products. The dataset I will use in this article can be downloaded below.

Now, let’s import the dataset and get started:

import pandas as pd import warnings warnings.filterwarnings('ignore') df = pd.read_excel("Online_Retail.xlsx") df.head() df1 = df
Image for post

The data contains all the transactions of a UK-based online retailer. All the transactions are between 01 Dec, 2010 and 09 Dec, 2011.

Data Exploration

Now let’s explore the data by looking at –

  1. Missing values in important columns;
  2. Customers’ distribution in each country;
  3. Unit price and Quantity should > 0;
  4. Invoice date should < today.
df1.Country.nunique()
array([‘United Kingdom’, ‘France’, ‘Australia’, ‘Netherlands’, ‘Germany’,
‘Norway’, ‘EIRE’, ‘Switzerland’, ‘Spain’, ‘Poland’, ‘Portugal’,
‘Italy’, ‘Belgium’, ‘Lithuania’, ‘Japan’, ‘Iceland’,
‘Channel Islands’, ‘Denmark’, ‘Cyprus’, ‘Sweden’, ‘Austria’,
‘Israel’, ‘Finland’, ‘Bahrain’, ‘Greece’, ‘Hong Kong’, ‘Singapore’,
‘Lebanon’, ‘United Arab Emirates’, ‘Saudi Arabia’, ‘Czech Republic’,
‘Canada’, ‘Unspecified’, ‘Brazil’, ‘USA’, ‘European Community’,
‘Malta’, ‘RSA’], dtype=object
customer_country=df1[['Country','CustomerID']].drop_duplicates() customer_country.groupby(['Country'])['CustomerID'].aggregate('count').reset_index().sort_values('CustomerID', ascending=False)
Image for post

The above data states that more than 90 percent of the customers are from United Kingdom only. Some researchers states that customer segmentation varies with geography. So I will limit the data to United Kingdom only.

df1 = df1.loc[df1['Country'] == 'United Kingdom']

Now let’s check whether, there are null values in the dataset:

df1.isnull().sum(axix=0)
output

The above output stated that there are 133,600 missing values in the customerID column, and as we are working on customer segmentation, so we do not need to focus on customer ID, so I will simply remove these missing values in this column.

df1 = df1[pd.notnull(df1['CustomerID'])]

Now let’s remove the negative values in Quantity column:

df1 = df1[(df1['Quantity']>0)]

Now let’s add a new column for total price:

df1['TotalPrice'] = df1['Quantity'] * df1['UnitPrice']

To find out the first and last order dates in the data:

df1['InvoiceDate'].min()
Timestamp(‘2010–12–01 08:26:00’)
df1['InvoiceDate'].max()
Timestamp(‘2011–12–09 12:49:00’)

As recency is calculated for a point in time, and the last invoice date is 2011–12–09, so I will use 2011–12–10 to calculate recency:

import datetime as dt NOW = dt.datetime(2011,12,10) df1['InvoiceDate'] = pd.to_datetime(df1['InvoiceDate'])

RFM Customer Segmentation

Now our data is completely ready for customer segmentation, lets move further with this by creating a RFM table:

rfmTable = df1.groupby('CustomerID').agg({'InvoiceDate': lambda x: (NOW - x.max()).days, 'InvoiceNo': lambda x: len(x), 'TotalPrice': lambda x: x.sum()}) rfmTable['InvoiceDate'] = rfmTable['InvoiceDate'].astype(int) rfmTable.rename(columns={'InvoiceDate': 'recency', 'InvoiceNo': 'frequency', 'TotalPrice': 'monetary_value'}, inplace=True)

To calculate RFM metrics for each customer:

rfmTable.head()
RFM customer segmentation

Observations:

  • CustomerID 12346 has frequency: 1, monetary value: $77,183.60 and recency: 325 days.
  • CustomerID 12747 has frequency: 103, monetary value: $4,196.01 and recency: 2 days

Split the metrics

The easiest way to split metrics into segments is by using quartiles.

  1. This gives us a starting point for the detailed analysis.
  2. 4 segments are easy to understand and explain.
quantiles = rfmTable.quantile(q=[0.25,0.5,0.75]) quantiles = quantiles.to_dict()

Create a Customer Segment RFM table

segmented_rfm = rfmTable

The lowest recency, highest frequency and best monetary amounts will convey are our best customers:

def RScore(x,p,d): if x <= d[p][0.25]: return 1 elif x <= d[p][0.50]: return 2 elif x <= d[p][0.75]: return 3 else: return 4 def FMScore(x,p,d): if x <= d[p][0.25]: return 4 elif x <= d[p][0.50]: return 3 elif x <= d[p][0.75]: return 2 else: return 1

Add segment numbers

segmented_rfm['r_quartile'] = segmented_rfm['recency'].apply(RScore, args=('recency',quantiles,)) segmented_rfm['f_quartile'] = segmented_rfm['frequency'].apply(FMScore, args=('frequency',quantiles,)) segmented_rfm['m_quartile'] = segmented_rfm['monetary_value'].apply(FMScore, args=('monetary_value',quantiles,)) segmented_rfm.head()
Image for post

RFM segments split the customer base into an imaginary 3D cube which is hard to visualize. However, we can sort it out:

segmented_rfm['RFMScore'] = segmented_rfm.r_quartile.map(str) + segmented_rfm.f_quartile.map(str) + segmented_rfm.m_quartile.map(str) segmented_rfm.head()
customer segmentation

Let’s See The Top 10 of best customers

segmented_rfm[segmented_rfm['RFMScore']=='111'].sort_values('monetary_value', ascending=False).head(10)
customer segmentation

Also, read – Master your Text Analytics Skills.

I hope you liked this article on Customer Segmentation with Python. Feel free to ask question on this topic or any topic you like, and don’t forget to subscribe for the newsletters below, to get email notification if you like my work.

Receive Daily Newsletters

Leave a Reply