The field of human resources analysis, which can be understood as an approach to human resources management focused on data and analytical thinking, is quickly becoming an indispensable part of organizational configurations. In this article, I will introduce you to a data science project on Human Resource Analysis with Python.
What is Human Resource Analysis?
In a competitive market scenario, the potential of an employee must be better exploited to ensure the success of the organization. In such an environment, human resources remain one of the main distinguishing factors of an organization which can be used for competitive growth to create the necessary organizational value.
Also, Read – 100+ Machine Learning Projects Solved and Explained.
The optimal use of the human resource capital that an organization possesses is an ongoing process; constant efforts in this direction will ensure that the human resources of an organization remain an asset and not a liability.
Human resource management should be undertaken taking into account the needs of the organization as a whole; it can be understood as an area of ​​study focused on the exploration of these practices and approaches, which can be implemented in the context of employees to achieve organizational goals.
Human resource analysis is a relatively new intervention in the broader field of HRM, and it refers to the use of statistical tools, measures and procedures, which can be used to employ and mask the most important decisions. It is often called people analysis or talent analysis or workforce analysis.
Human Resource Analysis with Python
In this section, I will take you through a Data Science project on Human Resource Analysis with Python. Here you will learn how to analyze the data of the employees working in the organization. I will start this task of Human Resource analysis by importing the necessary python libraries and the dataset:
Before analyzing the data let’s have a quick look at some insights and check if we are having duplicated values or not:
train.info()
Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 enrollee_id 19158 non-null int64 1 city 19158 non-null object 2 city_development_index 19158 non-null float64 3 gender 14650 non-null object 4 relevent_experience 19158 non-null object 5 enrolled_university 18772 non-null object 6 education_level 18698 non-null object 7 major_discipline 16345 non-null object 8 experience 19093 non-null object 9 company_size 13220 non-null object 10 company_type 13018 non-null object 11 last_new_job 18735 non-null object 12 training_hours 19158 non-null int64 13 target 19158 non-null float64 dtypes: float64(2), int64(2), object(10) memory usage: 2.0+ MB
sim = train.duplicated() sim.sum()
0
Now let’s visualize the missing values to warm up ourselves for using the Plotly library in Python. Here I will use the Plotly library as it gives more detailed stats as compared to matplotlib:


Some of the independent variables are zero for the type of company and the size of the company has more than 30% missing values. In this scenario, either we can do a mode imputation or we can (MICE) Multivariate imputation by chained equations. This we will do later. Now let’s move further with Human Resource Analysis with Python.
As City Development Index is one of the most important features in the data, let’s start by analyzing it:
plot_city = train['city'].value_counts()[0:50].reset_index() plot_city.columns = ['City','Count'] px.bar(plot_city,x='City',y='Count',template='gridon',title='City',color='Count')

The CDI crosses over the different clusters identified within the framework of urban indicators as it is based on five sub-indices namely, infrastructure, waste, health, education and city products.
plot_cdi =train['city_development_index'].value_counts().reset_index()[0:50] plot_cdi.columns = ['cdi','Count'] plot_cdi['cdi'] = plot_cdi['cdi'].astype('str') px.bar(plot_cdi,y="Count", x="cdi",color='Count',title='City development index')

Now let’s have a look at how many employees are coming for universities, full-time courses or part-time courses:
plot_gender = train['enrolled_university'].value_counts().reset_index() plot_gender.columns = ['enrolled_university','count'] px.pie(plot_gender,values='count',names='enrolled_university',template='simple_white',title='enrolled_university')

The above plot shows that more than 70 per cent of employees are coming without any course. Now let’s have a quick look at the distribution of education levels of the employees:
plot_gender = train['education_level'].value_counts().reset_index() plot_gender.columns = ['education_level','count'] px.pie(plot_gender,values='count',names='education_level',template='ggplot2',title='education_level')

Now the next task is to have a look at the major discipline of education of all the employees:
plot_gender = train['major_discipline'].value_counts().reset_index() plot_gender.columns = ['major_discipline','count'] px.pie(plot_gender,values='count',names='major_discipline',template='plotly',title='Major discipline')

At the end let’s analyze the distribution of the size of the company which is determined by the number of employees working in the company:
plot_gender = train['company_size'].value_counts().reset_index() plot_gender.columns = ['company_size','count'] px.pie(plot_gender,values='count',names='company_size',template='plotly_white',title='company_size is determined by no. of people employees')

I hope you liked this article on a data science project on Human Resource Analysis with Python. Feel free to ask your valuable questions in the comments section below.