Data Analysis on Suicides in India

This article is based on Data analysis with python. Here we will try to understand what might be the different reasons due to which people committed suicide in India, using the data set Suicides in India.

Almost 11,89,068 people committed suicide in 2012 alone, it is quite important to understand why these people commit suicide and try to mitigate.

You can download the data set used in this task from here:

Let’s Start by importing the libraries and reading the data:

import numpy as np #for math operations
import pandas as pd #for manipulating dataset
import matplotlib.pyplot as plt #for visualization
import seaborn as sns
%matplotlib inline
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_palette("BrBG")

# read dataset
df = pd.read_csv('suicides.csv')
df.tail()

Let’s see some insights of the data set

df.info()
#Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 237519 entries, 0 to 237518
Data columns (total 7 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   State      237519 non-null  object
 1   Year       237519 non-null  int64 
 2   Type_code  237519 non-null  object
 3   Type       237519 non-null  object
 4   Gender     237519 non-null  object
 5   Age_group  237519 non-null  object
 6   Total      237519 non-null  int64 
dtypes: int64(2), object(5)
memory usage: 12.7+ MB

Let’s check if there is any missing values in the data set

df.isna().sum()
#Output
State        0
Year         0
Type_code    0
Type         0
Gender       0
Age_group    0
Total        0
dtype: int64

How many people committed suicide from 2001-12?

print("Total cases from 2001-12: \n",df.groupby("Year")["Total"].sum())
df.groupby("Year")["Total"].sum().plot(kind="line")
#Output
Total cases from 2001-12: 
 Year
2001     976464
2002     993648
2003     997622
2004    1023137
2005    1025201
2006    1062991
2007    1103667
2008    1125082
2009    1144033
2010    1211322
2011    1219499
2012    1189068
Name: Total, dtype: int64
<matplotlib.axes._subplots.AxesSubplot at 0x7f5d69be7fd0>

What all states are present in the data set?

df["State"].value_counts()
#Output
Maharashtra          6792
Karnataka            6792
Madhya Pradesh       6792
Rajasthan            6791
Andhra Pradesh       6791
Odisha               6791
Chhattisgarh         6790
Bihar                6790
Haryana              6790
Kerala               6788
Uttar Pradesh        6787
Gujarat              6786
Tamil Nadu           6786
Assam                6786
Jharkhand            6785
Tripura              6782
Delhi (Ut)           6782
West Bengal          6780
Punjab               6779
Himachal Pradesh     6774
Jammu & Kashmir      6761
Goa                  6759
Uttarakhand          6758
Sikkim               6742
Mizoram              6737
Meghalaya            6733
Puducherry           6730
Chandigarh           6717
A & N Islands        6712
Daman & Diu          6710
Arunachal Pradesh    6707
Nagaland             6705
D & N Haveli         6704
Manipur              6700
Lakshadweep          6674
Total (States)        312
Total (All India)     312
Total (Uts)           312
Name: State, dtype: int64

To Remove rows with value as Total (States), Total (All India) or Total (Uts)

df = df[(df["State"]!="Total (States)")&(df["State"]!="Total (Uts)")&(df["State"]!="Total (All India)") ]

Which gender tends to commit more suicide?

It looks like Males tend to commit more suicides compared to Females in India.

filter_gender = pd.DataFrame(df.groupby("Gender")["Total"].sum()).reset_index()
sns.catplot(x="Gender", y="Total", kind="bar", data=filter_gender);

In which states do people tend to commit more suicide?

From the given visualization it is clear that the top 3 states with maximum suicide cases are

  1. Maharashtra
  2. West Bengal
  3. Andhra Pradesh
filter_state = pd.DataFrame(df.groupby(["State"])["Total"].sum()).reset_index()
sns.barplot(y = 'State', x = 'Total',data = filter_state, edgecolor = 'w')
plt.show()

Let’s create a word cloud

Words with larger font size are the States which have higher number of suicide cases.

from wordcloud import WordCloud
count = {}
for x in filter_state["State"].values:
    count[x]=int(filter_state[filter_state["State"]==x].Total)

wordcloud = WordCloud(width=1280,height=720,relative_scaling=1,background_color='white',normalize_plurals=False).generate_from_frequencies(count)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

How has the number of cases changed over time ?

From the previous bar chat, we know that male commit more suicide compared to female, but we didn’t know what is the rate of growth of no. of cases.

This plot shows a steeper positive slope for males compared to females -> which means more number of males might commit suicide in the future.

grouped_year = df.groupby(["Year","Gender"])["Total"].sum()
grouped_year = pd.DataFrame(grouped_year).reset_index()
# grouped_year
sns.lmplot(x="Year", y="Total", hue="Gender", data=grouped_year,height=8.27, aspect=11.7/8.27);

Number of cases bases on the reason they committed suicide

Note: Causes means other causes according to me(it was not clearly mentioned in the data set)

filter_type_code = pd.DataFrame(df.groupby(["Type_code","Year"])["Total"].sum()).reset_index()
filter_type_code
sns.catplot(x="Type_code", y="Total",hue="Year", kind="bar", data=filter_type_code,height=8.27, aspect=11.7/8.27);

Which social issues causes more suicides?

It appears that married people count for the majority of suicide cases.

Which makes sense because marriage issues may cause conflict between the couple and as a result they might be prone to commit suicide.

filter_social_status = pd.DataFrame(df[df["Type_code"]=="Social_Status"].groupby(["Type","Gender"])["Total"].sum()).reset_index()
sns.catplot(x="Type", y="Total",hue="Gender", kind="bar", data=filter_social_status,height=8.27, aspect=11.7/8.27);

What was the education status of people who committed suicides?

It appears that people with low education tend to commit more suicide.

People with Diploma and Graduate tend to commit least no. of suicide.

filter_social_status = pd.DataFrame(df[df["Type_code"]=="Education_Status"].groupby(["Type","Gender"])["Total"].sum()).reset_index()
g = sns.catplot(x="Type", y="Total",hue="Gender", kind="bar", data=filter_social_status,height=8.27, aspect=11.7/8.27);
g.set_xticklabels(rotation=90)

What was the profession of the people who committed suicides?

Farmers and housewives tend to commit more suicide compared to others.

This makes sense because most of the Indian farmers have debt and their life depends on the yield of their crops, if the yield is not good then they will not be able to clear their debt and in the worst case they might commit suicide.

Global warming, monsoon delay, drought etc can lead to bad yield.

Housewives might have issues in their marriage which this might be a reason for such a high number of cases.

Domestic violence, dowry, gender discrimination, etc might be some of the reasons for housewives to commit suicide.

filter_social_status = pd.DataFrame(df[df["Type_code"]=="Professional_Profile"].groupby(["Type","Gender"])["Total"].sum()).reset_index()
g = sns.catplot(x="Type", y="Total",hue="Gender", kind="bar", data=filter_social_status,height=8.27, aspect=11.7/8.27);
g.set_xticklabels(rotation=90)

Which age group people tend to commit more suicide?

From the below visualization it is clear that youngsters (15-29 age) and middle age (30-44) tend to commit the maximum number of suicides.

It can be due to several reasons like:

  • unemployment
  • academic stress
  • bad friend circle
  • farmers (since they have to be young and strong enough to do farming)
  • addictions
# age group 0-100+ encapsulates all the remaining age groups, hence it would make sense to drop it
filter_age = df[df["Age_group"]!="0-100+"]
sns.catplot(x="Age_group", y="Total", kind="bar", data=filter_age,height=8.27, aspect=11.7/8.27);

According to the data set –

  • Males tend to commit more suicides compared to Females in India
  • Highest no. of suicide cases occur in Maharashtra, West Bengal, and Andhra Pradesh.
  • Male might commit more suicide compared to females in the future if this trend continues.
  • People who commit suicide are mostly:
    • Married
    • Farmers and housewives
    • Youngsters (15-29 age) and middle age (30-44)

I hope this article based on Data Analysis will help you.

Follow us on Instagram for all your Queries

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1498

Leave a Reply