In this project we will be building a model that Predicts customer churn with Machine Learning. We do this by implementing a predictive model with the help of python.
Prediction of Customer Churn means our beloved customers with the intention of leaving us in the future.
Let’s Start by Importing the required Libraries
import numpy as np import pandas as pd import sklearn import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import StandardScaler from sklearn.metrics import classification_report from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split
Download the data set
Let’s read and look at the data
df = pd.read_csv("churn.csv") df

To show the number of rows and columns
df.shape
#Output
(7043, 21)
To see all column names
df.columns.values
#Output array(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'], dtype=object)
To check for NA or missing values
df.isna().sum()
#Output customerID 0 gender 0 SeniorCitizen 0 Partner 0 Dependents 0 tenure 0 PhoneService 0 MultipleLines 0 InternetService 0 OnlineSecurity 0 OnlineBackup 0 DeviceProtection 0 TechSupport 0 StreamingTV 0 StreamingMovies 0 Contract 0 PaperlessBilling 0 PaymentMethod 0 MonthlyCharges 0 TotalCharges 0 Churn 0 dtype: int64
To show some statistics
df.describe()

To get Customer Churn count
df['Churn'].value_counts()
#Output No 5174 Yes 1869 Name: Churn, dtype: int64
Visualize the count of customer churn
sns.countplot(df['Churn'])

To see the percentage of customers that are leaving
numRetained = df[df.Churn == 'No'].shape[0] numChurned = df[df.Churn == 'Yes'].shape[0] # print the percentage of customers that stayed print(numRetained/(numRetained + numChurned) * 100,'% of customers stayed in the company') # peint the percentage of customers that left print(numChurned/(numRetained + numChurned) * 100, '% of customers left with the company')
#Output 73.4630129206304 % of customers stayed in the company 26.536987079369588 % of customers left with the company
Visualize the churn count for both males and females
sns.countplot(x ='gender', hue='Churn', data=df)

Visualize the churn count for the internet service
sns.countplot(x='InternetService', hue='Churn', data=df)

To Visualize Numeric data
numericFeatures = ['tenure', 'MonthlyCharges'] fig, ax = plt.subplots(1,2, figsize=(28, 8)) df[df.Churn == "No"][numericFeatures].hist(bins=20, color='blue', alpha=0.5, ax=ax) df[df.Churn == "Yes"][numericFeatures].hist(bins=20, color='orange', alpha=0.5, ax=ax)

To remove unnecessary columns
cleanDF = df.drop('customerID', axis=1)
Convert all the non-numeric columns to numeric
Convert all the non-numeric columns to numeric for column in cleanDF.columns: if cleanDF[column].dtype == np.number: continue cleanDF[column] = LabelEncoder().fit_transform(cleanDF[column])
To show the data types
cleanDF.dtypes
#Output gender int64 SeniorCitizen int64 Partner int64 Dependents int64 tenure int64 PhoneService int64 MultipleLines int64 InternetService int64 OnlineSecurity int64 OnlineBackup int64 DeviceProtection int64 TechSupport int64 StreamingTV int64 StreamingMovies int64 Contract int64 PaperlessBilling int64 PaymentMethod int64 MonthlyCharges float64 TotalCharges int64 Churn int64 dtype: object
To show first 5 rows of the new data

Scale the data
Scaled the data x = cleanDF.drop('Churn', axis=1) y = cleanDF['Churn'] x = StandardScaler().fit_transform(x)
Split the data into 80% training and 20% testing
xtrain, xtest, ytrain, ytest = train_test_split(x,y, test_size=0.2, random_state=42)
Create and Train the model
model = LogisticRegression() # Train the model model.fit(xtrain, ytrain)
#Output LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='auto', n_jobs=None, penalty='l2', random_state=None, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)
Create the predictions on the test data
predictions = model.predict(xtest) # print the predictions print(predictions)
#Output [1 0 0 ... 0 0 0]
And Finally check the precision, recall and f1-score
print(classification_report(ytest, predictions))
#Output precision recall f1-score support 0 0.85 0.91 0.88 1036 1 0.69 0.56 0.62 373 accuracy 0.82 1409 macro avg 0.77 0.74 0.75 1409 weighted avg 0.81 0.82 0.81 1409