In this article, I will take you through a real-world task of Machine Learning task to predict the migration of humans between countries. Human migration is a type of human mobility, where a journey involves a person moving to change their domicile.
Predicting human migration as accurately as possible is important in city planning applications, international trade, the spread of infectious diseases, conservation planning, and public policymaking.
Also, Read – Build a Genetic Algorithm with Python.
Predict Migration with Machine Learning
I will start this task to predict migration by importing all the necessary libraries:
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn import svm
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
import numpy as np
from sklearn.naive_bayes import GaussianNB
Code language: JavaScript (javascript)
The dataset, I am using in this task to predict migration can be easily downloaded from here. Let’s see what the data looks like: I’d like to turn your attention to the “Measure”, “Country” and “CitizenShip” column. If we want to get a prediction result, we need to convert all of these string values to an integer:
data = pd.read_csv('migration_nz.csv')
data.head(10)
Code language: JavaScript (javascript)

But first, let’s see the unique values we have in the “Measure” column:
data['Measure'].unique()
Code language: CSS (css)
array(['Arrivals', 'Departures', 'Net'], dtype=object)
Now we need to give each unique string value its unique integer value: in case there are not that many values, it is possible to use the “replace” function:
data['Measure'].replace("Arrivals",0,inplace=True)
data['Measure'].replace("Departures",1,inplace=True)
data['Measure'].replace("Net",2,inplace=True)
Code language: PHP (php)
Now let’s check if everything has been correctly assigned:
data['Measure'].unique()
Code language: CSS (css)
array([0, 1, 2])
In this case, we have about 250 unique countries:
data['Country'].unique()
Code language: CSS (css)
array(['Oceania', 'Antarctica', 'American Samoa', 'Australia', 'Cocos Islands', 'Cook Islands', 'Christmas Island', 'Fiji', 'Micronesia', 'Guam', 'Kiribati', 'Marshall Islands', 'Northern Mariana Islands', 'New Caledonia', 'Norfolk Island', 'Nauru', 'Niue', 'New Zealand', 'French Polynesia', 'Papua New Guinea', 'Pitcairn Island', 'Palau', 'Solomon Islands', 'French Southern Territories', 'Tokelau', 'Tonga', 'Tuvalu', 'Vanuatu', 'Wallis and Futuna', 'Samoa', 'Asia', 'Afghanistan', 'Armenia', 'Azerbaijan', 'Bangladesh', 'Brunei Darussalam', 'Bhutan', 'China', 'Georgia', 'Hong Kong', 'Indonesia', 'India', 'Japan', 'Kyrgyzstan', 'Cambodia', 'North Korea', 'South Korea', 'Kazakhstan', 'Laos', 'Sri Lanka', 'Myanmar', 'Mongolia', 'Macau', 'Maldives', 'Malaysia', 'Nepal', 'Philippines', 'Pakistan', 'Singapore', 'Thailand', 'Tajikistan', 'Timor-Leste', 'Turkmenistan', 'Taiwan', 'Uzbekistan', 'Vietnam', 'Europe', 'Andorra', 'Albania', 'Austria', 'Bosnia and Herzegovina', 'Belgium', 'Bulgaria', 'Belarus', 'Switzerland', 'Czechoslovakia', 'Cyprus', 'Czechia', 'East Germany', 'Germany', 'Denmark', 'Estonia', 'Spain', 'Finland', 'Faeroe Islands', 'France', 'UK', 'Gibraltar', 'Greenland', 'Greece', 'Croatia', 'Hungary', 'Ireland', 'Iceland', 'Italy', 'Kosovo', 'Liechtenstein', 'Lithuania', 'Luxembourg', 'Latvia', 'Monaco', 'Moldova', 'Montenegro', 'Macedonia', 'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania', 'Serbia', 'Russia', 'Sweden', 'Slovenia', 'Slovakia', 'San Marino', 'USSR', 'Ukraine', 'Vatican City', 'Yugoslavia/Serbia and Montenegro', 'Americas', 'Antigua and Barbuda', 'Anguilla', 'Netherlands Antilles', 'Argentina', 'Aruba', 'Barbados', 'Bermuda', 'Bolivia', 'Brazil', 'Bahamas', 'Belize', 'Canada', 'Chile', 'Colombia', 'Costa Rica', 'Cuba', 'Curacao', 'Dominica', 'Dominican Republic', 'Ecuador', 'Falkland Islands', 'Grenada', 'French Guiana', 'Guadeloupe', 'South Georgia and the South Sandwich Islands', 'Guatemala', 'Guyana', 'Honduras', 'Haiti', 'Jamaica', 'St Kitts and Nevis', 'Cayman Islands', 'St Lucia', 'Martinique', 'Montserrat', 'Mexico', 'Nicaragua', 'Panama', 'Peru', 'St Pierre and Miquelon', 'Puerto Rico', 'Paraguay', 'Suriname', 'El Salvador', 'St Maarten', 'Turks and Caicos', 'Trinidad and Tobago', 'US Minor Outlying Islands', 'USA', 'Uruguay', 'St Vincent and the Grenadines', 'Venezuela', 'British Virgin Islands', 'US Virgin Islands', 'Africa and the Middle East', 'UAE', 'Angola', 'Burkina Faso', 'Bahrain', 'Burundi', 'Benin', 'Botswana', 'Democratic Republic of the Congo', 'Central African Republic', 'Congo', "Cote d'Ivoire", 'Cameroon', 'Cape Verde', 'Djibouti', 'Algeria', 'Egypt', 'Western Sahara', 'Eritrea', 'Ethiopia', 'Gabon', 'Ghana', 'Gambia', 'Guinea', 'Equatorial Guinea', 'Guinea-Bissau', 'Israel', 'British Indian Ocean Territory', 'Iraq', 'Iran', 'Jordan', 'Kenya', 'Comoros', 'Kuwait', 'Lebanon', 'Liberia', 'Lesotho', 'Libya', 'Morocco', 'Madagascar', 'Mali', 'Mauritania', 'Mauritius', 'Malawi', 'Mozambique', 'Namibia', 'Niger', 'Nigeria', 'Oman', 'Palestine', 'Qatar', 'Reunion', 'Rwanda', 'Saudi Arabia', 'Seychelles', 'Sudan', 'St Helena', 'Sierra Leone', 'Senegal', 'Somalia', 'South Sudan', 'Sao Tome and Principe', 'Syria', 'Swaziland', 'Chad', 'Togo', 'Tunisia', 'Turkey', 'Tanzania', 'Uganda', 'South Yemen', 'Yemen', 'Mayotte', 'South Africa', 'Zambia', 'Zimbabwe', 'Not stated', 'All countries'], dtype=object)
Now we need to assign each unique string value its unique integer value:
data['CountryID'] = pd.factorize(data.Country)[0]
data['CitID'] = pd.factorize(data.Citizenship)[0]
Code language: JavaScript (javascript)
Now, let’s see if everything is okay:
data['CountryID'].unique()
Code language: CSS (css)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252])
Another problem is that we have some missing values, let’s see how many and where exactly they are:
data.isnull().sum()
Code language: CSS (css)
Measure 0 Country 0 Citizenship 0 Year 0 Value 72 CountryID 0 CitID 0 dtype: int64
Now, I will simply fill these missing values with the median values:
data["Value"].fillna(data["Value"].median(),inplace=True)
Code language: PHP (php)
Now, let’s see if everything is fine so far:
data.isnull().sum()
Code language: CSS (css)
Measure 0 Country 0 Citizenship 0 Year 0 Value 0 CountryID 0 CitID 0 dtype: int64
Split The Data into Train and Test sets
Now, I will split the data into 70 per cent training and 30 per cent test set:
data.drop('Country', axis=1, inplace=True)
data.drop('Citizenship', axis=1, inplace=True)
from sklearn.cross_validation import train_test_split
X= data[['CountryID','Measure','Year','CitID']].as_matrix()
Y= data['Value'].as_matrix()
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.3, random_state=9)
Code language: JavaScript (javascript)
Predict Migration
Now, let’s predict migration using our Machine Learning algorithm and visualize the results:
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=70,max_features = 3,max_depth=5,n_jobs=-1)
rf.fit(X_train ,y_train)
rf.score(X_test, y_test)
Code language: JavaScript (javascript)
0.73654599831394985
X = data[['CountryID','Measure','Year','CitID']]
Y = data['Value']
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.3, random_state=9)
grouped = data.groupby(['Year']).aggregate({'Value' : 'sum'})
#Growth of migration to New-Zeland by year
grouped.plot(kind='line');plt.axhline(0, color='g')
sns.plt.show()
Code language: PHP (php)

grouped.plot(kind='bar');plt.axhline(0, color='g')
sns.plt.show()
Code language: JavaScript (javascript)

import seaborn as sns
corr = data.corr()
sns.heatmap(corr,
xticklabels=corr.columns.values,
yticklabels=corr.columns.values)
sns.plt.show()
Code language: JavaScript (javascript)

Also, Read – What is BigQuery in Data Science?
I hope you liked this article of a simple real-world task based on how to predict the migration of humans between countries. I hope you liked this article on predicting migrations with Machine Learning. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.