Human Resource Analysis with Python

The field of human resources analysis, which can be understood as an approach to human resources management focused on data and analytical thinking, is quickly becoming an indispensable part of organizational configurations. In this article, I will introduce you to a data science project on Human Resource Analysis with Python.

What is Human Resource Analysis?

In a competitive market scenario, the potential of an employee must be better exploited to ensure the success of the organization. In such an environment, human resources remain one of the main distinguishing factors of an organization which can be used for competitive growth to create the necessary organizational value.

Also, Read – 100+ Machine Learning Projects Solved and Explained.

The optimal use of the human resource capital that an organization possesses is an ongoing process; constant efforts in this direction will ensure that the human resources of an organization remain an asset and not a liability.

Human resource management should be undertaken taking into account the needs of the organization as a whole; it can be understood as an area of study focused on the exploration of these practices and approaches, which can be implemented in the context of employees to achieve organizational goals.

Human resource analysis is a relatively new intervention in the broader field of HRM, and it refers to the use of statistical tools, measures and procedures, which can be used to employ and mask the most important decisions. It is often called people analysis or talent analysis or workforce analysis.

Human Resource Analysis with Python

In this section, I will take you through a Data Science project on Human Resource Analysis with Python. Here you will learn how to analyze the data of the employees working in the organization. I will start this task of Human Resource analysis by importing the necessary python libraries and the dataset:

Dataset

Before analyzing the data let’s have a quick look at some insights and check if we are having duplicated values or not:

train.info()

Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   enrollee_id             19158 non-null  int64  
 1   city                    19158 non-null  object 
 2   city_development_index  19158 non-null  float64
 3   gender                  14650 non-null  object 
 4   relevent_experience     19158 non-null  object 
 5   enrolled_university     18772 non-null  object 
 6   education_level         18698 non-null  object 
 7   major_discipline        16345 non-null  object 
 8   experience              19093 non-null  object 
 9   company_size            13220 non-null  object 
 10  company_type            13018 non-null  object 
 11  last_new_job            18735 non-null  object 
 12  training_hours          19158 non-null  int64  
 13  target                  19158 non-null  float64
dtypes: float64(2), int64(2), object(10)
memory usage: 2.0+ MB

sim = train.duplicated() 
sim.sum()

Now let’s visualize the missing values to warm up ourselves for using the Plotly library in Python. Here I will use the Plotly library as it gives more detailed stats as compared to matplotlib:

Some of the independent variables are zero for the type of company and the size of the company has more than 30% missing values. In this scenario, either we can do a mode imputation or we can (MICE) Multivariate imputation by chained equations. This we will do later. Now let’s move further with Human Resource Analysis with Python.

As City Development Index is one of the most important features in the data, let’s start by analyzing it:

plot_city = train['city'].value_counts()[0:50].reset_index()
plot_city.columns = ['City','Count']
px.bar(plot_city,x='City',y='Count',template='gridon',title='City',color='Count')

The CDI crosses over the different clusters identified within the framework of urban indicators as it is based on five sub-indices namely, infrastructure, waste, health, education and city products.

plot_cdi =train['city_development_index'].value_counts().reset_index()[0:50] 
plot_cdi.columns = ['cdi','Count']
plot_cdi['cdi'] = plot_cdi['cdi'].astype('str')
px.bar(plot_cdi,y="Count", x="cdi",color='Count',title='City development index')

Now let’s have a look at how many employees are coming for universities, full-time courses or part-time courses:

plot_gender = train['enrolled_university'].value_counts().reset_index()
plot_gender.columns = ['enrolled_university','count']

px.pie(plot_gender,values='count',names='enrolled_university',template='simple_white',title='enrolled_university')

The above plot shows that more than 70 per cent of employees are coming without any course. Now let’s have a quick look at the distribution of education levels of the employees:

plot_gender = train['education_level'].value_counts().reset_index()
plot_gender.columns = ['education_level','count']

px.pie(plot_gender,values='count',names='education_level',template='ggplot2',title='education_level')

Now the next task is to have a look at the major discipline of education of all the employees:

plot_gender = train['major_discipline'].value_counts().reset_index()
plot_gender.columns = ['major_discipline','count']

px.pie(plot_gender,values='count',names='major_discipline',template='plotly',title='Major discipline')

At the end let’s analyze the distribution of the size of the company which is determined by the number of employees working in the company:

plot_gender = train['company_size'].value_counts().reset_index()
plot_gender.columns = ['company_size','count']

px.pie(plot_gender,values='count',names='company_size',template='plotly_white',title='company_size is determined by no. of people employees')

I hope you liked this article on a data science project on Human Resource Analysis with Python. Feel free to ask your valuable questions in the comments section below.