Data Science for Finance

Finance is one of the most important sectors in the world, Financial Decisions contribute a lot in the GDP of the country. Today Data Science in used for finance in:

  • Fraud Detection
  • Analyzing Customer Churn
  • Algorithmic trading
  • Risk Analysis and many more.

In this Article we will learn to analyse finance risks analysis with data science on market risks of mutual funds investments.

Let’s start with Importing the libraries and reading the data set, you can download the required data set from below:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline
from pandas_profiling import ProfileReport
from google.colab import files
uploaded = files.upload()
data = pd.read_csv('nifty.csv')
data['Date'] = pd.to_datetime(data['Date'])
data['year'] = data['Date'].dt.year
data['month'] = data['Date'].dt.month
data['day'] = data['Date'].dt.day

Report of Nifty 50 data

Nifty graph in the last 2 decades:

plt.figure(figsize=(10,7))
plt.plot(data['Date'],data['Close'])
plt.xlabel('Years')
plt.ylabel('Closing values')
plt.title('Closing values vs Years')
plt.show()

What years had the highest 5 peaks? —> 2015,2017,2018,2019,2020

peaks = data.loc[:, ['year','High']]
peaks['max_high'] = peaks.groupby('year')['High'].transform('max')
peaks.drop('High', axis=1, inplace=True)
peaks = peaks.drop_duplicates()
peaks = peaks.sort_values('max_high', ascending=False)
peaks = peaks.head()

fig = plt.figure(figsize=(10,7))
plt.pie(peaks['max_high'], labels=peaks['year'], autopct='%1.1f%%', shadow=True)
centre_circle = plt.Circle((0,0),0.45,color='black', fc='white',linewidth=1.25)
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.axis('equal')
plt.show()

From the above donut plot it seems that the prices were low in the year 2016.

Distribution of the highest values

sns.kdeplot(data=data['High'], shade=True)
plt.title('Distribution of highest values')
plt.show()

Distribution of transaction volume

sns.kdeplot(data=data['Volume'], shade=True)
plt.title('Transaction volume')
plt.show()

Volume of transactions per year

top_5_genres = [1,3,5,7,9,11]
perc = data.loc[:,["year","month",'Volume']]
perc['new_volume'] = perc.groupby([perc.month,perc.year])['Volume'].transform('mean')
perc.drop('Volume', axis=1, inplace=True)
perc = perc[perc.year<2020]
perc = perc.drop_duplicates()
perc = perc.loc[perc['month'].isin(top_5_genres)]
perc = perc.sort_values("year")

fig=px.bar(perc,x='month', y="new_volume", animation_frame="year", 
           animation_group="month", color="month", hover_name="month", range_y=[perc['new_volume'].min(), perc['new_volume'].max()])
fig.update_layout(showlegend=False)
fig.show()

Relation of Volume to stock prices

sns.scatterplot(data=data, x='Volume', y='Close')
plt.title('Relation of volume to stock prices')
plt.show()

What is turnover?

Turnover is the net sales generated by a business, while profit is the residual earnings of a business after all expenses have been charged against net sales.

Thus, turnover and profit are essentially the beginning and ending points of the income statement – the top-line revenues and the bottom-line results.

Distribution of Turnover

sns.kdeplot(data=data['Turnover'], shade=True)
plt.title('Turnover Distribution')
plt.show()

Relation of turnover to Volume

sns.scatterplot(data=data, x='Turnover', y='Volume')
plt.title('Relation of Turnover to Volume')
plt.show()

Monthly Turnover per year

turn = data.loc[:,['year','month','Turnover']]
turn['monthly_turnover'] = turn.groupby([turn.year,turn.month])['Turnover'].transform('mean')
turn.drop('Turnover', axis=1, inplace=True)
turn = turn.drop_duplicates()
fig = px.scatter(turn, x="month", y="monthly_turnover", animation_frame="year", animation_group="month", color="month", hover_name="month", size_max=1000 \
                , range_y=[turn['monthly_turnover'].min(), turn['monthly_turnover'].max()])
fig.update_traces(marker=dict(size=12,
                              line=dict(width=2,
                                        color='DarkSlateGrey')),
                  selector=dict(mode='markers'))
fig.show()

What is price to earnings ratio?

The price to earnings ratio (PE Ratio) is the measure of the share price relative to the annual net income earned by the firm per share. PE ratio shows current investor demand for a company share.

A high PE ratio generally indicates increased demand because investors anticipate earnings growth in the future.

Distribution of the P/E ratio

sns.kdeplot(data=data['P/E'], shade=True)
plt.title('P/E Distribution')
plt.show()

Relation of P/E ratio to Close values

sns.scatterplot(data=data, x='P/E', y='Close')
plt.title('Relation of P/E to Close')
plt.show()

Since price/earnings is an yearly concept. Let’s check the yearly graph.

df = data.loc[:,['year','P/E']]
df['meanPE'] = df.groupby('year')['P/E'].transform('mean')
df.drop('P/E',axis=1, inplace=True)
df = df.drop_duplicates().sort_values('year')


plt.figure(figsize=(10,7))
plt.plot(df['year'],df['meanPE'])
plt.xlabel('Years')
plt.ylabel('P/E values')
plt.title('P/E values vs Years')
plt.show()

A huge rise is observed from 2005-2007. This signifies that investors anticipated earning growth in future. But then there is a huge fall from 2008. I am assuming this fall is due to the State Bank of Saurashtra scam of 2008.

Similar cases can be seen in 2011. This may be due to the Belekari Port Scam of 2011.

What is P/B?

The price-to-book (P/B) ratio has been favored by value investors for decades and is widely used by market analysts.

Traditionally, any value under 1.0 is considered a good P/B value, indicating a potentially undervalued stock. However, value investors often consider stocks with a P/B value under 3.0.

Distribution of P/B

sns.kdeplot(data=data['P/B'], shade=True)
plt.title('P/B Distribution')
plt.show()

Relation of P/B to Close values

sns.scatterplot(data=data, x='P/B', y='Close', hue='year')
plt.title('Relation of P/B to Close')
plt.show()

It can be noticed that the highest closing stock prices are yielded for a P/B ratio between 3 to 4.

So a higher P/B value than 4 don’t indicate better prices.

Yearly graph of P/B

df = data.loc[:,['year','P/B']]
df['meanPB'] = df.groupby('year')['P/B'].transform('mean')
df.drop('P/B',axis=1, inplace=True)
df = df.drop_duplicates().sort_values('year')


plt.figure(figsize=(10,7))
plt.plot(df['year'],df['meanPB'])
plt.xlabel('Years')
plt.ylabel('P/B values')
plt.title('P/B values vs Years')
plt.show()

Since 2003 the graph sees a huge rise and huge fall from 2006.

I don’t know what this signifies but I think I need to analyze the years 2005-2009 separately because these years show some really interesting stats whether it be for the scams or something else will find it out.

Why dividend yield value decrease?

During recessions or otherwise uncertain times, dividend-paying stocks can rapidly decrease in value because there is a risk that future dividends will be reduced. If a company announces that it’s lowering its dividend, the stock price will react immediately.

P/E – P/B – Div Yield vs Years

df = data.loc[:,['year','P/B','P/E','Div Yield']]
df[['meanPE','meanPB','meandiv']] = df.groupby('year')[['P/B','P/E','Div Yield']].transform('max')
df.drop(['P/B','P/E','Div Yield'],axis=1, inplace=True)
df = df.drop_duplicates().sort_values('year')


plt.figure(figsize=(10,7))
plt.plot(df['year'],df['meandiv'], label='meandiv')
plt.plot(df['year'],df['meanPB'], label='meanPB')
plt.plot(df['year'],df['meanPE'], label='meanPE')
plt.xlabel('Years')
plt.legend()
plt.show()

Based on the observations made in the data and analyzing it year by year, month by month I feel like it may have been hit very hard due to the pandemic but it will soon rise above it when the environment gets stabilized.

In India the sentiments of people towards the stock market plays a huge importance on the graph. Let’s hope the pandemic period gets over soon.

Follow us on Instagram for all your Queries

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1537

Leave a Reply