# Virat Kohli Performance Analysis using Python

Analyzing a player’s performance is one of the use cases of Data Science in sports analytics. Virat Kohli is one of the most famous cricketers in the world. So it will be a great data science project if we analyze the batting performance of Virat Kohli over the years. So if you want to learn how to analyze Virat Kohl’s performance, this article is for you. In this article, I will take you through the task of Virat Kohli Performance Analysis using Python.

## Virat Kohli Performance Analysis (Case Study)

Virat Kohli is one of the most famous cricketers in the world. Here you are given a dataset of all the ODI matches played by Virat Kohli from 18 August 2008 to 22 January 2017. You are required to analyze the performance of Virat Kohli in ODI matches.

Below is the complete information about all the columns in the dataset:

1. Runs: Runs in the match
2. BF: Balls faced in the match
3. 4s: number of 4s in a match
4. 6s: number of 6s in a match
5. SR: Strike Rate in the match
6. Pos: Batting Position in the match
7. Dismissal: How Virat Kohli got out in the match
8. Inns: 1st and 2nd innings
9. Opposition: Who was the opponent of India
10. Ground: Venue of the match
11. Start Date: Date of the match

## Virat Kohli Performance Analysis using Python

Now let’s start with the task of Virat Kohli performance analysis using Python. I will start this task by importing the necessary Python libraries and the dataset:

```import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

```   Runs  BF  4s  6s     SR  Pos Dismissal  Inns   Opposition         Ground  \
0    12  22   1   0  54.54  2.0       lbw     1  v Sri Lanka       Dambulla
1    37  67   6   0  55.22  2.0    caught     2  v Sri Lanka       Dambulla
2    25  38   4   0  65.78  1.0   run out     1  v Sri Lanka  Colombo (RPS)
3    54  66   7   0  81.81  1.0    bowled     1  v Sri Lanka  Colombo (RPS)
4    31  46   3   1  67.39  1.0       lbw     2  v Sri Lanka  Colombo (RPS)

Start Date
0  18-Aug-08
1  20-Aug-08
2  24-Aug-08
3  27-Aug-08
4  29-Aug-08```

Let’s have a look at whether this dataset contains any null values or not before moving forward:

`print(data.isnull().sum())`
```Runs          0
BF            0
4s            0
6s            0
SR            0
Pos           0
Dismissal     0
Inns          0
Opposition    0
Ground        0
Start Date    0
dtype: int64```

The dataset contains matches played by Virat Kohli between 18 August 2008 and 22 January 2017. So let’s have a look at the total runs scored by Virat Kohli:

```# Total Runs Between 18-Aug-08 - 22-Jan-17
data["Runs"].sum()```
`6184`

Now let’s have a look at the average of Virat Kohli during the same period:

```# Average Runs Between 18-Aug-08 - 22-Jan-17
data["Runs"].mean()```
```46.84848484848485
```

In ODIs, the batting average of 35-37 is considered a good average. So Virat Kohl’s batting average is good. Now let’s have a look at the trend of runs scored by Virat Kohli in his career from 18 August 2008 to 22 January 2017:

```matches = data.index
figure = px.line(data, x=matches, y="Runs",
title='Runs Scored by Virat Kohli Between 18-Aug-08 - 22-Jan-17')
figure.show()```

In so many innings played by Virat Kohli, he scored over 100 or close to it. That is a good sign of consistency. Now let’s see all the batting positions played by Virat Kohli:

```# Batting Positions
data["Pos"] = data["Pos"].map({3.0: "Batting At 3", 4.0: "Batting At 4", 2.0: "Batting At 2",
1.0: "Batting At 1", 7.0:"Batting At 7", 5.0:"Batting At 5",
6.0: "batting At 6"})

Pos = data["Pos"].value_counts()
label = Pos.index
counts = Pos.values
colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"]

fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Number of Matches At Different Batting Positions')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()```

In more than 68% of all the innings played by Virat Kohli, he batted in the third position. Now let’s have a look at the total runs scored by Virat Kohli in different positions:

```label = data["Pos"]
counts = data["Runs"]
colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"]

fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Runs By Virat Kohli At Different Batting Positions')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()```

More than 72% of the total runs scored by Virat Kohli are while batting at 3rd position. So we can say batting at 3rd position is perfect for Virat Kohli.

Now let’s have a look at the number of centuries scored by Virat Kohli while batting in the first innings and second innings:

```centuries = data.query("Runs >= 100")
figure = px.bar(centuries, x=centuries["Inns"], y = centuries["Runs"],
color = centuries["Runs"],
title="Centuries By Virat Kohli in First Innings Vs. Second Innings")
figure.show()```

So most of the centuries are scored while batting in the second innings. By this, we can say that Virat Kohli likes chasing scores. Now let’s have a look at the kind of dismissals Virat Kohli faced most of the time:

```# Dismissals of Virat Kohli
dismissal = data["Dismissal"].value_counts()
label = dismissal.index
counts = dismissal.values
colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"]

fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(title_text='Dismissals of Virat Kohli')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30,
marker=dict(colors=colors, line=dict(color='black', width=3)))
fig.show()```

So most of the time, Virat Kohli gets out by getting caught by the fielder or the keeper. Now let’s have a look at against which team Virat Kohli scored most of his runs:

```figure = px.bar(data, x=data["Opposition"], y = data["Runs"], color = data["Runs"],
title="Most Runs Against Teams")
figure.show()```

According to the above figure, Virat Kohli likes batting against Sri Lanka, Australia, New Zealand, West Indies, and England. But he scored most of his runs while batting against Sri Lanka. Now let’s have a look at against which team Virat Kohli scored most of his centuries:

```figure = px.bar(centuries, x=centuries["Opposition"], y = centuries["Runs"],
color = centuries["Runs"],
title="Most Centuries Against Teams")
figure.show()```

So, most of the centuries scored by Virat Kohli were against Australia. Now let’s analyze Virat Kohli’s strike rate. To analyze Virat Kohli’s strike rate, I will create a new dataset of all the matches played by Virat Kohli where his strike rate was more than 120:

```strike_rate = data.query("SR >= 120")
print(strike_rate)```
```     Runs  BF  4s  6s      SR           Pos Dismissal  Inns     Opposition  \
8      27  19   4   0  142.10  Batting At 7    bowled     1    v Sri Lanka
32    100  83   8   2  120.48  Batting At 4   not out     1   v Bangladesh
56     23  11   3   0  209.09  batting At 6   not out     1  v West Indies
76     43  34   4   1  126.47  Batting At 3    caught     1      v England
78    102  83  13   2  122.89  Batting At 3    caught     1  v West Indies
83    100  52   8   7  192.30  Batting At 3   not out     2    v Australia
85    115  66  18   1  174.24  Batting At 3   not out     2    v Australia
93     78  65   7   2  120.00  Batting At 3    caught     2  v New Zealand
130     8   5   2   0  160.00  Batting At 3    caught     1      v England

Ground Start Date
8           Rajkot  15-Dec-09
32           Dhaka  19-Feb-11
56          Indore   8-Dec-11
76      Birmingham  23-Jun-13
78   Port of Spain   5-Jul-13
83          Jaipur  16-Oct-13
85          Nagpur  30-Oct-13
93        Hamilton  22-Jan-14
130        Cuttack  19-Jan-17 ```

Now let’s see whether Virat Kohli plays with high strike rates in the first innings or second innings:

```figure = px.bar(strike_rate, x = strike_rate["Inns"],
y = strike_rate["SR"],
color = strike_rate["SR"],
title="Virat Kohli's High Strike Rates in First Innings Vs. Second Innings")
figure.show()```

So according to the above figure, Virat Kohli likes playing more aggressively in the first innings compared to the second innings. Now let’s see the relationship between runs scored by Virat Kohli and fours played by him in each innings:

```figure = px.scatter(data_frame = data, x="Runs",
y="4s", size="SR", trendline="ols",
title="Relationship Between Runs Scored and Fours")
figure.show()```

There is a linear relationship. It means that Virat Kohli likes playing fours. The more runs he scores in the innings, the more fours he plays. Let’s see if there is some relationship with the sixes:

```figure = px.scatter(data_frame = data, x="Runs",
y="6s", size="SR", trendline="ols",
title= "Relationship Between Runs Scored and Sixes")
figure.show()```

There is no strong linear relationship here. It means Virat Kohli likes playing fours more than sixes. So this is how you can analyze the performance of Virat Kohli or any other cricketer in the world.

### Summary

So this is how you can perform Virat Kohli performance analysis using the Python programming language. Analyzing a player’s performance is one of the use cases of Data Science in sports analytics. I hope you liked this article on Virat Kohli performance analysis using Python. Feel free to ask valuable questions in the comments section below.

##### Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of dataðŸ“ˆ.

Articles: 1537

1. #### rajumm1996@gmail.com

you done very good work.
thank you

• #### Aman Kharwal

keep visiting

2. #### Kartik Khatri

Hi Aman,

I am able to do some using excel so why we need python coding and all.
We can create a dynamic dashboard in Excel for performing same task for multiple players at same time.

• #### Aman Kharwal

You can work on much more data with a programming language as compared to spreadsheets.

3. #### Aman Pratap Singh

Hello.
How do you collect datasets?
Which tool or software you use to collect it?
If possible,can you please specify it in detail.It will be very helpful for me.