Classify Nationalities with Machine Learning

In this article, I will take you through how we can classify the nationalities of people by using their names. You will be thinking about how we can classify nationalities by using just names. There is a lot about how we can play with names.

Classify Nationalities

Let’s get started with this machine learning task to classify nationalities by importing the necessary packages. I will classify nationalities based on names as Indian or Non-Indian. So, let’s import some packages and get started with the task:

Also, Read – Machine Learning Full Course for free.

from tensorflow import keras
import tensorflow as tf
import pandas as pd
import os
import reCode language: JavaScript (javascript)

Now, let’s import the datasets. The datasets I am using here in this article can be easily downloaded from here. Now after importing the datasets I will prepare two helper functions for data cleaning and data processing:

male_data = pd.read_csv(male.csv)
female_data = pd.read_csv(femaile.csv)
13754

After loading and removing the wrong entries in the data, we got a few records around 13,000.

For non-Indian names, there is a nifty package called Faker. This generates names from different regions:

from faker import Faker
fake = Faker(‘en_US’)
fake.name()Code language: JavaScript (javascript)
‘Brian Evans’

We have generated approximately the same number of names as we have in the Indian data set. We then removed samples longer than 5 words. The Indian data set contained a lot of names with just first names. So we need to make the overall non-Indian distribution also similar.

non_indian_data.head()Code language: CSS (css)
namecount_words
0sara gulbrandsen2
1kathryn villarreal2
2jennifer mccormick2
3james eaton2
4melissa bond2

We end up with about 14,000 non-Indian names and 13,000 Indian names. Now let’s build a neural network to classify nationalities using names:

namespredictions_lstm_char
0lalithaindian
1tysonnon_indian
2shailajaindian
3shyamalaindian
4vishwanathanindian
5ramanujamindian
6conannon_indian
7kryslovskynon_indian
8ratnaniindian
9diegonon_indian
10kakoliindian
11shreyasindian
12braydennon_indian
13shanonnon_indian

So this is how we can easily classify nationalities with machine learning. I did not include the full code and exploration here, you can have a look at the full code from here. Feel free to ask your valuable questions in the comments section below.

Also, Read – How to Save Machine Learning Models?

Follow Us:

Thecleverprogrammer
Thecleverprogrammer
Articles: 75

Leave a Reply