Cricket is an important part of the culture in India and the Indian Premier League (IPL) matches are one of the most important events in India. In this article, I will introduce you to a data science project on IPL analysis with Python.
Data Science Project on IPL Analysis
IPL is a major professional cricket league in India, contested by eight teams representing different cities in India. This IPL analysis task focuses on analyzing the performance of the eight competing IPL teams.
Also, Read – 100+ Machine Learning Projects Solved and Explained.
Team performance is visualized graphically using the Plotly library in Python to render interpretation efficiently. Performance data using visual analysis help select players for future matches and provides additional information about the player as well as team profiles.
IPL Analysis with Python
Now let’s start the task of IPL analysis with Python by importing the necessary Python libraries and the dataset:
In December 2018 the team changed their name from Delhi Daredevils to Delhi Capitals and Sunrisers Hyderabad replaced Deccan Chargers in 2012 and debuted in 2013. But I consider them to be the same in this IPL analysis task. Now let’s start with some data preparation:
Let’s start with looking at the number of matches played in every season of the IPL:
The year 2013 has the most matches, possibly due to super overs. Also, there are 10 teams in 2011, 9 in 2012 and 2013, this is another reason for the increase in the number of matches.
Matches Played Vs Wins
Now let’s have a look at the number of matches played by each team and the number matches won by them:
Now let’s analyze the winning percentage of all IPL teams:
So MI, SRH and RCB are the top three teams with the highest winning percentage. Let’s look at the winning percentage of these three teams:
win_percentage = round(matches_played['wins']/matches_played['Total Matches'],3)*100 win_percentage.head(3)
Team MI 59.1 SRH 53.3 RCB 50.8 dtype: float64
The next step in IPL analysis is to have a look at the venues where the most number of matches have been played:
So Eden Gardens, M Chinnaswamy, Wankhede and Feroz Shah Kotla are stadiums with most matches because most of each season’s eliminators, playoffs and finals were there. Now let’s have a look the most prefered decision taken by teams after winning the toss:
Runs Per Season
In this section of IPL analysis with Python, we will analyze the runs per season. Let’s start by looking at the average and total runs of all the seasons:
Now let’s have a look the distributions of runs over the years which will be distributed among three categories; 6s, 4s and remaining runs:
We can see just a slight increase in runs by boundaries over the years. At last, we will look at the highest runs scored by teams over the years:
I hope you liked this article on a data science project on IPL analysis with Python. Feel free to ask your valuable questions in the comments section below.
Hi. Could you please provide a code for the Total and Average runs per Seasons graph. I am referring to Runs Per Season section
Please update on my query regarding code for the Total and Average runs per Seasons graph.
Sure, I will come up with a new tutorial soon to cover everything in more detail
Actually I need a code, for the 2 graphs you generated above for “Total and Average runs per Seasons”.
Hope I clarify my query. Please advise at your earliest. Thanks in advance