World Happiness Report with Data Science

The first World Happiness Report was released on April 1, 2012 as a foundational text for the UN High Level Meeting: Well-being and Happiness: Defining a New Economic Paradigm, drawing international attention.

The report outlined the state of world happiness, causes of happiness and misery, and policy implications highlighted by case studies. In 2013, the second World Happiness Report was issued, and since then has been issued on an annual basis with the exception of 2014.

The report primarily uses data from the Gallup World Poll. Each annual report is available to the public to download on the World Happiness Report website.

METHODS AND PHILOSOPHY

The rankings of national happiness are based on a Cantrell ladder survey. Nationally representative samples of respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0.

They are then asked to rate their own current lives on that 0 to 10 scale. The report correlates the results with various life factors.

In the reports, experts in fields including economics, psychology, survey analysis, and national statistics, describe how measurements of well-being can be used effectively to assess the progress of nations, and other topics.

Each report is organized by chapters that delve deeper into issues relating to happiness, including mental illness, the objective benefits of happiness, the importance of ethics, policy implications, and links with the Organisation for Economic Co-operation and Development’s (OECD) approach to measuring subjective well-being and other international and national efforts.

In This Data Science Project I will create a world happiness report with Python on the data of 2019 provided by the United Nations.

Let’s Start by importing the libraries:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

Download the data set

2019 Download

Highlighting the maximum values of each attribute in the data set

df=pd.read_csv('2019.csv')
df.head()
def highlight_max(s):    
    is_max = s == s.max()
    return ['background-color: limegreen' if v else '' for v in is_max]
 
df.style.apply(highlight_max)

Checking out the shape of our data set

df.shape

Plotting pairwise relationships in the data set

import seaborn as sns
sns.pairplot(df)

Top 10 countries for each attribute

fig, axes = plt.subplots(nrows=2, ncols=2,constrained_layout=True,figsize=(12,8))

sns.barplot(x='GDP per capita',y='Country or region',data=df.nlargest(10,'GDP per capita'),ax=axes[0,0],palette="Blues_d")

sns.barplot(x='Social support' ,y='Country or region',data=df.nlargest(10,'Social support'),ax=axes[0,1],palette="YlGn")

sns.barplot(x='Healthy life expectancy' ,y='Country or region',data=df.nlargest(10,'Healthy life expectancy'),ax=axes[1,0],palette='OrRd')

sns.barplot(x='Freedom to make life choices' ,y='Country or region',data=df.nlargest(10,'Freedom to make life choices'),ax=axes[1,1],palette='YlOrBr')

fig, axes = plt.subplots(nrows=1, ncols=2,constrained_layout=True,figsize=(10,4))

sns.barplot(x='Generosity' ,y='Country or region',data=df.nlargest(10,'Generosity'),ax=axes[0],palette='Spectral')
sns.barplot(x='Perceptions of corruption' ,y='Country or region',data=df.nlargest(10,'Perceptions of corruption'),ax=axes[1],palette='RdYlGn')

Now I want to give a category to each country as High,Mid and Low according to their happiness scores.Thus we have to find out the bound in which these categories shall lie in.

print('max:',df['Score'].max())
print('min:',df['Score'].min())
add=df['Score'].max()-df['Score'].min()
grp=round(add/3,3)
print('range difference:',(grp))

#Output
max: 7.769
min: 2.853
range difference: 1.639

low=df['Score'].min()+grp
mid=low+grp

print('upper bound of Low grp',low)
print('upper bound of Mid grp',mid)
print('upper bound of High grp','max:',df['Score'].max())

#Output
upper bound of Low grp 4.492
upper bound of Mid grp 6.131
upper bound of High grp max: 7.769

df.info()

#Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 156 entries, 0 to 155
Data columns (total 9 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Overall rank                  156 non-null    int64  
 1   Country or region             156 non-null    object 
 2   Score                         156 non-null    float64
 3   GDP per capita                156 non-null    float64
 4   Social support                156 non-null    float64
 5   Healthy life expectancy       156 non-null    float64
 6   Freedom to make life choices  156 non-null    float64
 7   Generosity                    156 non-null    float64
 8   Perceptions of corruption     156 non-null    float64
dtypes: float64(7), int64(1), object(1)
memory usage: 11.1+ KB

Finally adding a new column Category to the data set and distributing the levels High,Low,Mid.

cat=[]
for i in df.Score:
    if(i>0 and i<low):
        cat.append('Low')
        
        
    elif(i>low and i<mid):
         cat.append('Mid')
    else:
         cat.append('High')

df['Category']=cat

Also we will be styling the data set as a green zone and red zone .If the country belongs to the High category then it is under the green zone and if it is under the Mid or Low category it is going to be the red zone.

color = (df.Category == 'High' ).map({True: 'background-color: limegreen',False:'background-color: red'})
df.style.apply(lambda s: color)

Now since I am from India I would like to see my countries position on the list and also checkout some other countries where people from India usually settle for economic benefits. So lets check them out:

df.loc[df['Country or region']=='India']

Lets have a head to head comparison for the some random countries to understand why they have such a good or a band rank worldwide and get some insight.

data={
    'Country or region':['Canada','US','UK','India'],
    'Score':[7.278,6.892,7.054,4.015],
    'GDP per capita':[1.365,1.433,1.333,0.755],
    'Social support':[1.505,1.457,1.538,0.765],
    'Healthy life expectancy':[1.039,0.874,0.996,0.588],
    'Freedom to make life choices':[0.584,0.454,0.45,0.498],
    'Generosity':[0.285,0.28,0.348,0.2],
    'Perceptions of corruption':[0.308,0.128,0.278,0.085]
}
d=pd.DataFrame(data)
d

Social Support vs GDP per capita vs Healthy life expectancy

ax = d.plot(y="Social support", x="Country or region", kind="bar",color='C3')
d.plot(y="GDP per capita", x="Country or region", kind="bar", ax=ax, color="C1")
d.plot(y="Healthy life expectancy", x="Country or region", kind="bar", ax=ax, color="C2")

plt.show()

Freedom to make life choices vs Generosity vs Corruption

ax = d.plot(y="Freedom to make life choices", x="Country or region", kind="bar",color='C3')
d.plot(y="Generosity", x="Country or region", kind="bar", ax=ax, color="C1",)
d.plot(y="Perceptions of corruption", x="Country or region", kind="bar", ax=ax, color="C2",)

plt.show()

Geographic Visualization of Happiness Score

import plotly.graph_objs as go
from plotly.offline import iplot

data = dict(type = 'choropleth', 
           locations = df['Country or region'],
           locationmode = 'country names',
           colorscale='RdYlGn',
           z = df['Score'], 
           text = df['Country or region'],
           colorbar = {'title':'Happiness Score'})

layout = dict(title = 'Geographical Visualization of Happiness Score', 
              geo = dict(showframe = True, projection = {'type': 'azimuthal equal area'}))

choromap3 = go.Figure(data = [data], layout=layout)
iplot(choromap3)