Employee Attrition Prediction with Python

In this article, I’ll introduce you to a machine learning project on employee attrition prediction with Python programming language. Employees are considered the backbone of an organization. The success or failure of the organization depends on the employees who work for an organization. Organizations must deal with the problems when trained, skilled and experienced employees leave the organization for better opportunities.

What is Employee Attrition Prediction?

Employee attrition is downsizing in any organization where employees resign. Employees are valuable assets of any organization. It is necessary to know whether the employees are dissatisfied or whether there are other reasons for leaving their respective jobs.

Also, Read – 100+ Machine Learning Projects Solved and Explained.

Nowadays, for better opportunities, employees are eager to move from one organization to another. But if they quit their jobs unexpectedly, it can result in a huge loss for the organization. A new hire will consume money and time, and newly hired employees will also take time to make the respective organization profitable.

Retaining skilled and hardworking employees is one of the most critical challenges many organizations face. Therefore, by improving employee satisfaction and providing a desirable working environment, we can certainly reduce this problem significantly.

Machine Learning Project on Employee Attrition Prediction with Python

In this section, I will take you through a Machine Learning project on predicting Employee Attrition prediction with Python programming language. I will start this task by importing the necessary Python libraries that we need for this task:

Now let’s read the data and do some exploratory data analysis to understand this dataset properly:

attrition = pd.read_csv('Employee-Attrition.csv')

Usually one of the first steps in data exploration is getting a rough idea of how the features are distributed among them. To do this, I’ll use the kdeplot function in the seaborn library in Python:

image for employee attrition

Finding Correlation

The next step in a data exploration is to find the correlation matrix. By plotting a correlation matrix, we get a really good look at how the features relate to each other. 

In this correlation plot, I will be using the Plotly library in Python to produce an interactive Pearson correlation matrix via the Heatmap function as follows:

image for employee attrition correlation

Observations From Above Plot:

From the correlation plot, we can see that a lot of our columns appear to be poorly correlated to each other. Generally, when building a predictive model, it would be better to train a model with features that are not too correlated with each other so that we don’t need to deal with redundant features. 

In the case where we have a large number of correlated characteristics, perhaps we could apply a technique such as principal component analysis (PCA) to reduce the characteristic space.

Feature Engineering

After exploring our dataset, let’s now move on to the task o feature engineering and numerically encoding the categorical values in our dataset. Feature engineering involves creating new features and relationships from the current features that we have.

For this task, we’ll separate the numeric columns from the categorical columns as follows:

After identifying which of our features contain categorical data, we can start to digitally encode the data. To do this, I’ll use Pandas’ get_dummies method in Python which creates dummy variables encoded from the categorical variables:

attrition_cat = attrition[categorical]
attrition_cat = attrition_cat.drop(['Attrition'], axis=1) # Dropping the target column
attrition_cat = pd.get_dummies(attrition_cat)
attrition_num = attrition[numerical]
attrition_final = pd.concat([attrition_num, attrition_cat], axis=1)

One last step we need to remember is to generate our target variable. The target, in this case, is given by the Attrition column which contains categorical variables therefore requires numeric coding. We digitally encode it by creating a dictionary with the given mapping as 1: Yes and 0: No:

target_map = {'Yes':1, 'No':0}
# Use the pandas apply method to numerically encode our attrition target variable
target = attrition["Attrition"].apply(lambda x: target_map[x])

Machine Learning for Employee Attrition Prediction with Python

Now, we need to train a Machine Learning model for predicting Employee Attrition prediction with Python. For this task, I will use the Random Forest Classification model provided by Scikit-learn.

But before implementing Machine Learning for prediction of Employee Attrition prediction we need to split the data into a training set and test set:

Now let’s train the Random forest classification model for the task of Employee Attrition prediction using Machine Learning and Python:

Accuracy score: 0.8537414965986394
              precision    recall  f1-score   support

           0       0.90      0.93      0.91       245
           1       0.57      0.49      0.53        49

   micro avg       0.85      0.85      0.85       294
   macro avg       0.74      0.71      0.72       294
weighted avg       0.85      0.85      0.85       294

As observed, our Random Forest returns around 88% accuracy for its predictions and at first glance, this may seem like a fairly good model.

Sklearn’s Random Forest classifier also contains a very handy attribute for analyzing feature importance which tells us which features in our dataset have received the most importance by the Random Forest algorithm. Let’s visualize the features taken into account by our machine learning model for employee attrition:

Employee attrition prediction

I hope you liked this article on Machine Learning project on Employee Attrition Prediction with Python programming language. Feel free to ask your valuable questions in the comments section below.

Articles: 77

Leave a Reply