
This article is based on Data analysis with python. Here we will try to understand what might be the different reasons due to which people committed suicide in India, using the data set Suicides in India.
Almost 11,89,068 people committed suicide in 2012 alone, it is quite important to understand why these people commit suicide and try to mitigate.
You can download the data set used in this task from here:
Let’s Start by importing the libraries and reading the data:
import numpy as np #for math operations import pandas as pd #for manipulating dataset import matplotlib.pyplot as plt #for visualization import seaborn as sns %matplotlib inline sns.set(rc={'figure.figsize':(11.7,8.27)}) sns.set_palette("BrBG") # read dataset df = pd.read_csv('suicides.csv') df.tail()

Let’s see some insights of the data set
df.info()
#Output <class 'pandas.core.frame.DataFrame'> RangeIndex: 237519 entries, 0 to 237518 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 State 237519 non-null object 1 Year 237519 non-null int64 2 Type_code 237519 non-null object 3 Type 237519 non-null object 4 Gender 237519 non-null object 5 Age_group 237519 non-null object 6 Total 237519 non-null int64 dtypes: int64(2), object(5) memory usage: 12.7+ MB
Let’s check if there is any missing values in the data set
df.isna().sum()
#Output State 0 Year 0 Type_code 0 Type 0 Gender 0 Age_group 0 Total 0 dtype: int64
How many people committed suicide from 2001-12?
print("Total cases from 2001-12: \n",df.groupby("Year")["Total"].sum()) df.groupby("Year")["Total"].sum().plot(kind="line")
#Output Total cases from 2001-12: Year 2001 976464 2002 993648 2003 997622 2004 1023137 2005 1025201 2006 1062991 2007 1103667 2008 1125082 2009 1144033 2010 1211322 2011 1219499 2012 1189068 Name: Total, dtype: int64 <matplotlib.axes._subplots.AxesSubplot at 0x7f5d69be7fd0>

What all states are present in the data set?
df["State"].value_counts()
#Output Maharashtra 6792 Karnataka 6792 Madhya Pradesh 6792 Rajasthan 6791 Andhra Pradesh 6791 Odisha 6791 Chhattisgarh 6790 Bihar 6790 Haryana 6790 Kerala 6788 Uttar Pradesh 6787 Gujarat 6786 Tamil Nadu 6786 Assam 6786 Jharkhand 6785 Tripura 6782 Delhi (Ut) 6782 West Bengal 6780 Punjab 6779 Himachal Pradesh 6774 Jammu & Kashmir 6761 Goa 6759 Uttarakhand 6758 Sikkim 6742 Mizoram 6737 Meghalaya 6733 Puducherry 6730 Chandigarh 6717 A & N Islands 6712 Daman & Diu 6710 Arunachal Pradesh 6707 Nagaland 6705 D & N Haveli 6704 Manipur 6700 Lakshadweep 6674 Total (States) 312 Total (All India) 312 Total (Uts) 312 Name: State, dtype: int64
To Remove rows with value as Total (States), Total (All India) or Total (Uts)
df = df[(df["State"]!="Total (States)")&(df["State"]!="Total (Uts)")&(df["State"]!="Total (All India)") ]
Which gender tends to commit more suicide?
It looks like Males tend to commit more suicides compared to Females in India.
filter_gender = pd.DataFrame(df.groupby("Gender")["Total"].sum()).reset_index() sns.catplot(x="Gender", y="Total", kind="bar", data=filter_gender);

In which states do people tend to commit more suicide?
From the given visualization it is clear that the top 3 states with maximum suicide cases are
- Maharashtra
- West Bengal
- Andhra Pradesh
filter_state = pd.DataFrame(df.groupby(["State"])["Total"].sum()).reset_index() sns.barplot(y = 'State', x = 'Total',data = filter_state, edgecolor = 'w') plt.show()

Let’s create a word cloud
Words with larger font size are the States which have higher number of suicide cases.
from wordcloud import WordCloud count = {} for x in filter_state["State"].values: count[x]=int(filter_state[filter_state["State"]==x].Total) wordcloud = WordCloud(width=1280,height=720,relative_scaling=1,background_color='white',normalize_plurals=False).generate_from_frequencies(count) plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") plt.show()

How has the number of cases changed over time ?
From the previous bar chat, we know that male commit more suicide compared to female, but we didn’t know what is the rate of growth of no. of cases.
This plot shows a steeper positive slope for males compared to females -> which means more number of males might commit suicide in the future.
grouped_year = df.groupby(["Year","Gender"])["Total"].sum() grouped_year = pd.DataFrame(grouped_year).reset_index() # grouped_year sns.lmplot(x="Year", y="Total", hue="Gender", data=grouped_year,height=8.27, aspect=11.7/8.27);

Number of cases bases on the reason they committed suicide
Note: Causes means other causes according to me(it was not clearly mentioned in the data set)
filter_type_code = pd.DataFrame(df.groupby(["Type_code","Year"])["Total"].sum()).reset_index() filter_type_code sns.catplot(x="Type_code", y="Total",hue="Year", kind="bar", data=filter_type_code,height=8.27, aspect=11.7/8.27);

Which social issues causes more suicides?
It appears that married people count for the majority of suicide cases.
Which makes sense because marriage issues may cause conflict between the couple and as a result they might be prone to commit suicide.
filter_social_status = pd.DataFrame(df[df["Type_code"]=="Social_Status"].groupby(["Type","Gender"])["Total"].sum()).reset_index() sns.catplot(x="Type", y="Total",hue="Gender", kind="bar", data=filter_social_status,height=8.27, aspect=11.7/8.27);

What was the education status of people who committed suicides?
It appears that people with low education tend to commit more suicide.
People with Diploma and Graduate tend to commit least no. of suicide.
filter_social_status = pd.DataFrame(df[df["Type_code"]=="Education_Status"].groupby(["Type","Gender"])["Total"].sum()).reset_index() g = sns.catplot(x="Type", y="Total",hue="Gender", kind="bar", data=filter_social_status,height=8.27, aspect=11.7/8.27); g.set_xticklabels(rotation=90)

What was the profession of the people who committed suicides?
Farmers and housewives tend to commit more suicide compared to others.
This makes sense because most of the Indian farmers have debt and their life depends on the yield of their crops, if the yield is not good then they will not be able to clear their debt and in the worst case they might commit suicide.
Global warming, monsoon delay, drought etc can lead to bad yield.
Housewives might have issues in their marriage which this might be a reason for such a high number of cases.
Domestic violence, dowry, gender discrimination, etc might be some of the reasons for housewives to commit suicide.
filter_social_status = pd.DataFrame(df[df["Type_code"]=="Professional_Profile"].groupby(["Type","Gender"])["Total"].sum()).reset_index() g = sns.catplot(x="Type", y="Total",hue="Gender", kind="bar", data=filter_social_status,height=8.27, aspect=11.7/8.27); g.set_xticklabels(rotation=90)

Which age group people tend to commit more suicide?
From the below visualization it is clear that youngsters (15-29 age) and middle age (30-44) tend to commit the maximum number of suicides.
It can be due to several reasons like:
- unemployment
- academic stress
- bad friend circle
- farmers (since they have to be young and strong enough to do farming)
- addictions
# age group 0-100+ encapsulates all the remaining age groups, hence it would make sense to drop it filter_age = df[df["Age_group"]!="0-100+"] sns.catplot(x="Age_group", y="Total", kind="bar", data=filter_age,height=8.27, aspect=11.7/8.27);

According to the data set –
- Males tend to commit more suicides compared to Females in India
- Highest no. of suicide cases occur in Maharashtra, West Bengal, and Andhra Pradesh.
- Male might commit more suicide compared to females in the future if this trend continues.
- People who commit suicide are mostly:
- Married
- Farmers and housewives
- Youngsters (15-29 age) and middle age (30-44)
I hope this article based on Data Analysis will help you.