Predict Migration with Machine Learning

In this article, I will take you through a real-world task of Machine Learning task to predict the migration of humans between countries. Human migration is a type of human mobility, where a journey involves a person moving to change their domicile.

Predicting human migration as accurately as possible is important in city planning applications, international trade, the spread of infectious diseases, conservation planning, and public policymaking.

Also, Read – Build a Genetic Algorithm with Python.

Predict Migration with Machine Learning

I will start this task to predict migration by importing all the necessary libraries:

import pandas as pd from sklearn.cross_validation import train_test_split from sklearn import svm import seaborn as sns import matplotlib.pyplot as plt from sklearn.metrics import mean_squared_error import numpy as np from sklearn.naive_bayes import GaussianNB
Code language: JavaScript (javascript)

The dataset, I am using in this task to predict migration can be easily downloaded from here. Let’s see what the data looks like: I’d like to turn your attention to the “Measure”, “Country” and “CitizenShip” column. If we want to get a prediction result, we need to convert all of these string values ​​to an integer:

data = pd.read_csv('migration_nz.csv') data.head(10)
Code language: JavaScript (javascript)
migration data

But first, let’s see the unique values ​​we have in the “Measure” column:

data['Measure'].unique()
Code language: CSS (css)
array(['Arrivals', 'Departures', 'Net'], dtype=object)

Now we need to give each unique string value its unique integer value: in case there are not that many values, it is possible to use the “replace” function:

data['Measure'].replace("Arrivals",0,inplace=True) data['Measure'].replace("Departures",1,inplace=True) data['Measure'].replace("Net",2,inplace=True)
Code language: PHP (php)

Now let’s check if everything has been correctly assigned:

data['Measure'].unique()
Code language: CSS (css)

array([0, 1, 2])

In this case, we have about 250 unique countries:

data['Country'].unique()
Code language: CSS (css)
array(['Oceania', 'Antarctica', 'American Samoa', 'Australia',
       'Cocos Islands', 'Cook Islands', 'Christmas Island', 'Fiji',
       'Micronesia', 'Guam', 'Kiribati', 'Marshall Islands',
       'Northern Mariana Islands', 'New Caledonia', 'Norfolk Island',
       'Nauru', 'Niue', 'New Zealand', 'French Polynesia',
       'Papua New Guinea', 'Pitcairn Island', 'Palau', 'Solomon Islands',
       'French Southern Territories', 'Tokelau', 'Tonga', 'Tuvalu',
       'Vanuatu', 'Wallis and Futuna', 'Samoa', 'Asia', 'Afghanistan',
       'Armenia', 'Azerbaijan', 'Bangladesh', 'Brunei Darussalam',
       'Bhutan', 'China', 'Georgia', 'Hong Kong', 'Indonesia', 'India',
       'Japan', 'Kyrgyzstan', 'Cambodia', 'North Korea', 'South Korea',
       'Kazakhstan', 'Laos', 'Sri Lanka', 'Myanmar', 'Mongolia', 'Macau',
       'Maldives', 'Malaysia', 'Nepal', 'Philippines', 'Pakistan',
       'Singapore', 'Thailand', 'Tajikistan', 'Timor-Leste',
       'Turkmenistan', 'Taiwan', 'Uzbekistan', 'Vietnam', 'Europe',
       'Andorra', 'Albania', 'Austria', 'Bosnia and Herzegovina',
       'Belgium', 'Bulgaria', 'Belarus', 'Switzerland', 'Czechoslovakia',
       'Cyprus', 'Czechia', 'East Germany', 'Germany', 'Denmark',
       'Estonia', 'Spain', 'Finland', 'Faeroe Islands', 'France', 'UK',
       'Gibraltar', 'Greenland', 'Greece', 'Croatia', 'Hungary', 'Ireland',
       'Iceland', 'Italy', 'Kosovo', 'Liechtenstein', 'Lithuania',
       'Luxembourg', 'Latvia', 'Monaco', 'Moldova', 'Montenegro',
       'Macedonia', 'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal',
       'Romania', 'Serbia', 'Russia', 'Sweden', 'Slovenia', 'Slovakia',
       'San Marino', 'USSR', 'Ukraine', 'Vatican City',
       'Yugoslavia/Serbia and Montenegro', 'Americas',
       'Antigua and Barbuda', 'Anguilla', 'Netherlands Antilles',
       'Argentina', 'Aruba', 'Barbados', 'Bermuda', 'Bolivia', 'Brazil',
       'Bahamas', 'Belize', 'Canada', 'Chile', 'Colombia', 'Costa Rica',
       'Cuba', 'Curacao', 'Dominica', 'Dominican Republic', 'Ecuador',
       'Falkland Islands', 'Grenada', 'French Guiana', 'Guadeloupe',
       'South Georgia and the South Sandwich Islands', 'Guatemala',
       'Guyana', 'Honduras', 'Haiti', 'Jamaica', 'St Kitts and Nevis',
       'Cayman Islands', 'St Lucia', 'Martinique', 'Montserrat', 'Mexico',
       'Nicaragua', 'Panama', 'Peru', 'St Pierre and Miquelon',
       'Puerto Rico', 'Paraguay', 'Suriname', 'El Salvador', 'St Maarten',
       'Turks and Caicos', 'Trinidad and Tobago',
       'US Minor Outlying Islands', 'USA', 'Uruguay',
       'St Vincent and the Grenadines', 'Venezuela',
       'British Virgin Islands', 'US Virgin Islands',
       'Africa and the Middle East', 'UAE', 'Angola', 'Burkina Faso',
       'Bahrain', 'Burundi', 'Benin', 'Botswana',
       'Democratic Republic of the Congo', 'Central African Republic',
       'Congo', "Cote d'Ivoire", 'Cameroon', 'Cape Verde', 'Djibouti',
       'Algeria', 'Egypt', 'Western Sahara', 'Eritrea', 'Ethiopia',
       'Gabon', 'Ghana', 'Gambia', 'Guinea', 'Equatorial Guinea',
       'Guinea-Bissau', 'Israel', 'British Indian Ocean Territory', 'Iraq',
       'Iran', 'Jordan', 'Kenya', 'Comoros', 'Kuwait', 'Lebanon',
       'Liberia', 'Lesotho', 'Libya', 'Morocco', 'Madagascar', 'Mali',
       'Mauritania', 'Mauritius', 'Malawi', 'Mozambique', 'Namibia',
       'Niger', 'Nigeria', 'Oman', 'Palestine', 'Qatar', 'Reunion',
       'Rwanda', 'Saudi Arabia', 'Seychelles', 'Sudan', 'St Helena',
       'Sierra Leone', 'Senegal', 'Somalia', 'South Sudan',
       'Sao Tome and Principe', 'Syria', 'Swaziland', 'Chad', 'Togo',
       'Tunisia', 'Turkey', 'Tanzania', 'Uganda', 'South Yemen', 'Yemen',
       'Mayotte', 'South Africa', 'Zambia', 'Zimbabwe', 'Not stated',
       'All countries'], dtype=object)

Now we need to assign each unique string value its unique integer value:

data['CountryID'] = pd.factorize(data.Country)[0] data['CitID'] = pd.factorize(data.Citizenship)[0]
Code language: JavaScript (javascript)

Now, let’s see if everything is okay:

data['CountryID'].unique()
Code language: CSS (css)
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
       182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
       195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
       208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220,
       221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233,
       234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246,
       247, 248, 249, 250, 251, 252])

Another problem is that we have some missing values, let’s see how many and where exactly they are:

data.isnull().sum()
Code language: CSS (css)
Measure         0
Country         0
Citizenship     0
Year            0
Value          72
CountryID       0
CitID           0
dtype: int64

Now, I will simply fill these missing values with the median values:

data["Value"].fillna(data["Value"].median(),inplace=True)
Code language: PHP (php)

Now, let’s see if everything is fine so far:

data.isnull().sum()
Code language: CSS (css)
Measure        0
Country        0
Citizenship    0
Year           0
Value          0
CountryID      0
CitID          0
dtype: int64

Split The Data into Train and Test sets

Now, I will split the data into 70 per cent training and 30 per cent test set:

data.drop('Country', axis=1, inplace=True) data.drop('Citizenship', axis=1, inplace=True) from sklearn.cross_validation import train_test_split X= data[['CountryID','Measure','Year','CitID']].as_matrix() Y= data['Value'].as_matrix() X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size=0.3, random_state=9)
Code language: JavaScript (javascript)

Predict Migration

Now, let’s predict migration using our Machine Learning algorithm and visualize the results:

from sklearn.ensemble import RandomForestRegressor rf = RandomForestRegressor(n_estimators=70,max_features = 3,max_depth=5,n_jobs=-1) rf.fit(X_train ,y_train) rf.score(X_test, y_test)
Code language: JavaScript (javascript)
0.73654599831394985
X = data[['CountryID','Measure','Year','CitID']] Y = data['Value'] X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size=0.3, random_state=9) grouped = data.groupby(['Year']).aggregate({'Value' : 'sum'}) #Growth of migration to New-Zeland by year grouped.plot(kind='line');plt.axhline(0, color='g') sns.plt.show()
Code language: PHP (php)
image for post
grouped.plot(kind='bar');plt.axhline(0, color='g') sns.plt.show()
Code language: JavaScript (javascript)
image for post
import seaborn as sns corr = data.corr() sns.heatmap(corr, xticklabels=corr.columns.values, yticklabels=corr.columns.values) sns.plt.show()
Code language: JavaScript (javascript)
predict migration

Also, Read – What is BigQuery in Data Science?

I hope you liked this article of a simple real-world task based on how to predict the migration of humans between countries. I hope you liked this article on predicting migrations with Machine Learning. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.

Follow Us:

Default image
Aman Kharwal

I am a programmer from India, and I am here to guide you with Data Science, Machine Learning, Python, and C++ for free. I hope you will learn a lot in your journey towards Coding, Machine Learning and Artificial Intelligence with me.

Leave a Reply