Companies often receive thousands of resumes for each job posting and employ dedicated screening officers to screen qualified candidates. In this article, I will introduce you to a machine learning project on Resume Screening with Python programming language.
What is Resume Screening?
Hiring the right talent is a challenge for all businesses. This challenge is magnified by the high volume of applicants if the business is labour-intensive, growing, and facing high attrition rates.
Also, Read – 100+ Machine Learning Projects Solved and Explained.
An example of such a business is that IT departments are short of growing markets. In a typical service organization, professionals with a variety of technical skills and business domain expertise are hired and assigned to projects to resolve customer issues. This task of selecting the best talent among many others is known as Resume Screening.
Typically, large companies do not have enough time to open each CV, so they use machine learning algorithms for the Resume Screening task.
Machine Learning Project on Resume Screening with Python
In this section, I will take you through a Machine Learning project on Resume Screening with Python programming language. I will start this task by importing the necessary Python libraries and the dataset:

Now let’s have a quick look at the categories of resumes present in the dataset:
print ("Displaying the distinct categories of resume -") print (resumeDataSet['Category'].unique())
Displaying the distinct categories of resume - ['Data Science' 'HR' 'Advocate' 'Arts' 'Web Designing' 'Mechanical Engineer' 'Sales' 'Health and fitness' 'Civil Engineer' 'Java Developer' 'Business Analyst' 'SAP Developer' 'Automation Testing' 'Electrical Engineering' 'Operations Manager' 'Python Developer' 'DevOps Engineer' 'Network Security Engineer' 'PMO' 'Database' 'Hadoop' 'ETL Developer' 'DotNet Developer' 'Blockchain' 'Testing']
Now let’s have a look at the distinct categories of resume and the number of records belonging to each category:
print ("Displaying the distinct categories of resume and the number of records belonging to each category -") print (resumeDataSet['Category'].value_counts())
Displaying the distinct categories of resume and the number of records belonging to each category - Java Developer 84 Testing 70 DevOps Engineer 55 Python Developer 48 Web Designing 45 HR 44 Hadoop 42 Mechanical Engineer 40 Sales 40 ETL Developer 40 Blockchain 40 Operations Manager 40 Data Science 40 Arts 36 Database 33 Electrical Engineering 30 Health and fitness 30 PMO 30 DotNet Developer 28 Business Analyst 28 Automation Testing 26 Network Security Engineer 25 SAP Developer 24 Civil Engineer 24 Advocate 20 Name: Category, dtype: int64
Now let’s visualize the number of categories in the dataset:
import seaborn as sns plt.figure(figsize=(15,15)) plt.xticks(rotation=90) sns.countplot(y="Category", data=resumeDataSet)

Now let’s visualize the distribution of categories:

Now I will create a helper function to remove the URLs, hashtags, mentions, special letters, and punctuations:
Now as we have cleared the dataset, the next task is to have a look at the Wordcloud. A Wordcloud represents the most numbers of words larger and vice versa:
[('Details', 484), ('Exprience', 446), ('months', 376), ('company', 330), ('description', 310), ('1', 290), ('year', 232), ('January', 216), ('Less', 204), ('Data', 200), ('data', 192), ('Skill', 166), ('Maharashtra', 166), ('6', 164), ('Python', 156), ('Science', 154), ('I', 146), ('Education', 142), ('College', 140), ('The', 126), ('project', 126), ('like', 126), ('Project', 124), ('Learning', 116), ('India', 114), ('Machine', 112), ('University', 112), ('Web', 106), ('using', 104), ('monthsCompany', 102), ('B', 98), ('C', 98), ('SQL', 96), ('time', 92), ('learning', 90), ('Mumbai', 90), ('Pune', 90), ('Arts', 90), ('A', 84), ('application', 84), ('Engineering', 78), ('24', 76), ('various', 76), ('Software', 76), ('Responsibilities', 76), ('Nagpur', 76), ('development', 74), ('Management', 74), ('projects', 74), ('Technologies', 72)]

Now I will convert these words into categorical values:
Training Machine Learning Model for Resume Screening
Now the next step in the process is to train a model for the task of Resume Screening. Here I will use the one vs the rest classifier; KNeighborsClassifier. For this task, I will first split the data into training and test sets:
Now let’s train the model and print the classification report:
Accuracy of KNeighbors Classifier on training set: 0.99 Accuracy of KNeighbors Classifier on test set: 0.99 Classification report for classifier OneVsRestClassifier(estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=5, p=2, weights='uniform'), n_jobs=None): precision recall f1-score support 0 1.00 1.00 1.00 3 1 1.00 1.00 1.00 3 2 1.00 0.80 0.89 5 3 1.00 1.00 1.00 9 4 1.00 1.00 1.00 6 5 0.83 1.00 0.91 5 6 1.00 1.00 1.00 9 7 1.00 1.00 1.00 7 8 1.00 0.91 0.95 11 9 1.00 1.00 1.00 9 10 1.00 1.00 1.00 8 11 0.90 1.00 0.95 9 12 1.00 1.00 1.00 5 13 1.00 1.00 1.00 9 14 1.00 1.00 1.00 7 15 1.00 1.00 1.00 19 16 1.00 1.00 1.00 3 17 1.00 1.00 1.00 4 18 1.00 1.00 1.00 5 19 1.00 1.00 1.00 6 20 1.00 1.00 1.00 11 21 1.00 1.00 1.00 4 22 1.00 1.00 1.00 13 23 1.00 1.00 1.00 15 24 1.00 1.00 1.00 8 micro avg 0.99 0.99 0.99 193 macro avg 0.99 0.99 0.99 193 weighted avg 0.99 0.99 0.99 193
So this is how we can train a Machine Learning model for the task of Resume Screening. I hope you liked this article on Resume Screening with Python programming language. Feel free to ask your valuable questions in the comments section below.
What a Great Article Aman. Keep the good work. as a new professional to the ML world, I have some questions here. 1. basically this means our model predicts a resume to which ‘Category’ it falls(Mr A’s resume is HR resume). right? 2. What can we do if we want a model that predicts the right applicant(of all the applicants) for a given job-description? f as an HR professional, I would be delighted to see your demo.
Thanks for your feedback, I will deploy it and present it again soon so that you can have your answers visually.
Very Nice Article Aman and thanks helping the community . If I want to demo it by uploading a resume and see which of the categories that resume belongs to, How Can I do that?
I will add a new tutorial on it
Great Tutorial! I learnt a LOT!! What is the best approach for creating a small system which filters the right candidates based on keywords? I would love to see something like this. Any recommendations, please? Thank you so much, sir.
Sure, I will soon post a project on this problem
Very Nice Article Aman and thanks helping the community.
thanks, keep visiting