# Customer Churn Prediction with Python

Machine Learning Project on Customer Churn Prediction with Python.

One of the most straightforward and effective approaches to retaining current customers is that the business must be able to anticipate and respond to the risk of churn over time. In this article, I will introduce you to a machine learning project on customer churn prediction with Python programming language.

## Introduction to Customer Churn Prediction

Recognize the signs of potential unsubscribe; meeting customer needs, restoring and re-establishing loyalty are actions that are supposed to help the organization to minimize the costs of winning new customers.

Also, Read – 100+ Machine Learning Projects Solved and Explained.

Customer churn prediction systems are used to understand exact customer behaviour and function as an alert on the danger and timing of customer attrition. The precision of the strategy used is considered essential to the achievement of any proactive retention intention.

After all, if the decision-maker cannot anticipate the client’s intention to leave their company, no appropriate decision can be made regarding that client.

## Machine Learning Project on Customer Churn Prediction with Python

In this section, I will take you through a machine learning project on customer churn prediction by using the Python programming language. For this task, I will be using the data of credit card customers. Let’s start by importing all the necessary Python libraries and the dataset:

### Exploratory Data Analysis

Let’s start with some EDA which is a very important task while working on any task of analyzing a business. So the first step in EDA is to have a look at the distribution of ages of customers:

We can see that the age distribution of clients in our data set follows a fairly normal distribution, so further use of the age function can be done with the normality assumption.

Now let’s have a look at the proportion of genders of the customers:

`ex.pie(c_data,names='Gender',title='Propotion Of Customer Genders')`

There are more samples of women in our data set than men, but the percentage difference is not that significant, so we can say that the genders are evenly distributed.

Now let’s have a look at the distribution of dependent counts in the data:

The distribution of dependent accounts is fairly normally distributed with a slight bias to the right.

Now let’s analyze the proportion of education levels:

`ex.pie(c_data,names='Education_Level',title='Propotion Of Education Levels')`

If we assume that most of the clients with unknown education level have no training, we can state that over 70% of clients have a formal education level of which around 35% have higher education level.

Now let’s have a look at the different types of the marital status of the customers:

`ex.pie(c_data,names='Marital_Status',title='Propotion Of Different Marriage Statuses')`

Almost half of the bank’s customers are married and, interestingly, almost all of the other half are single customers – only about 7% of customers are divorced, which is surprising given the global statistics on the divorce rate.

Now let’s have a look at the distribution of the total transaction amount in the last 12 months:

We see that the distribution of total transactions (last 12 months) shows a multimodal distribution, which means that we have underlying groups in our data, it can be an interesting experience to try to group the different groups and to see the similarities between them and what best describes the different groups that create the different modes of our distribution.

Now let’s have a look at the proportion of Churn vs not churn in the dataset:

`ex.pie(c_data,names='Attrition_Flag',title='Proportion of churn vs not churn customers')`

As we can see that only 16% of the data samples represent churn customers, in the following steps I will use SMOTE to oversample the churn samples to match the size of the regular customer sample to give the later selected models a better chance of catching on small details that will almost certainly be missing with such a size difference.

#### Data Processing for Customer Churn Prediction

I will start data processing for the task of customer churn prediction with Python by using one-hot encoding on all the categorical features describing different statuses of a customer:

Now let’s perform data unsampling by using the SMOTE method:

Now, I’m going to use Principal Component Analysis to reduce the dimensionality of hot-encoded categorical variables losing some of the variances, but at the same time using a few major components instead of dozens of hot-encoded features will help to build a better machine learning model for customer churn prediction with Python:

`usampled_df_with_pcs = pd.concat([usampled_df,pd.DataFrame(pc_matrix,columns=['PC-{}'.format(i) for i in range(0,N_COMPONENTS)])],axis=1)`

### Training Machine Learning for Customer Churn Prediction with Python:

Now let’s move at the last step in the task of customer churn prediction with Python, which is to train a machine learning model. I will first split the data into training and test sets and then I will create a pipeline for using the Random Forest Classifier:

`F1 Score of Random Forest Model On Test Set - 0.9105188005711566`

Now let’s evaluate the performance of the customer churn prediction model on the original data: