Ads Click Through Rate Prediction using Python

Ads Click Through Rate is the ratio of how many users clicked on your ad to how many users viewed your ad. For example, 5 out of 100 users click on the ad while watching a youtube video. So, in this case, the CTR of the youtube ad will be 5%. Analyzing the click-through rate help companies in finding the best ad for their target audience. So, if you want to learn how to analyze and predict the Ads click-through rate with Machine Learning, this article is for you. This article will take you through Ads Click Through Rate prediction with Machine Learning using Python.

Ads Click-Through Rate Prediction

Ads Click-through rate prediction means predicting whether the user will click on the ad. In the task of ads click-through rate prediction, we need to train a Machine Learning model to find relationships between the characteristics of all the users who click on ads.

I found an ideal dataset for this task containing data about the user and whether the user clicked on the ad. You can download the dataset from here.

In the section below, I will take you through the task of Ads Click Through Rate Prediction with Machine Learning using Python.

Ads Click-Through Rate Prediction using Python

Let’s start the task of ads click-through rate prediction by importing the necessary Python libraries and the dataset:

import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
import numpy as np
pio.templates.default = "plotly_white"

data = pd.read_csv("ad_10000records.csv")
print(data.head())
   Daily Time Spent on Site   Age  Area Income  Daily Internet Usage  \
0                     62.26  32.0     69481.85                172.83   
1                     41.73  31.0     61840.26                207.17   
2                     44.40  30.0     57877.15                172.83   
3                     59.88  28.0     56180.93                207.17   
4                     49.21  30.0     54324.73                201.58   

                         Ad Topic Line             City  Gender  \
0      Decentralized real-time circuit         Lisafort    Male   
1       Optional full-range projection  West Angelabury    Male   
2  Total 5thgeneration standardization        Reyesfurt  Female   
3          Balanced empowering success      New Michael  Female   
4  Total 5thgeneration standardization     West Richard  Female   

                        Country            Timestamp  Clicked on Ad  
0  Svalbard & Jan Mayen Islands  2016-06-09 21:43:05              0  
1                     Singapore  2016-01-16 17:56:05              0  
2                    Guadeloupe  2016-06-29 10:50:45              0  
3                        Zambia  2016-06-21 14:32:32              0  
4                         Qatar  2016-07-21 10:54:35              1  

The “Clicked on Ad” column contains 0 and 1 values, where 0 means not clicked, and 1 means clicked. I’ll transform these values into “yes” and “no”:

data["Clicked on Ad"] = data["Clicked on Ad"].map({0: "No", 
                               1: "Yes"})

Click Through Rate Analysis

Now let’s analyze the click-through rate based on the time spent by the users on the website:

fig = px.box(data, 
             x="Daily Time Spent on Site",  
             color="Clicked on Ad", 
             title="Click Through Rate based Time Spent on Site", 
             color_discrete_map={'Yes':'blue',
                                 'No':'red'})
fig.update_traces(quartilemethod="exclusive")
fig.show()
Click Through Rate based on Time Spent on Site

From the above graph, we can see that the users who spend more time on the website click more on ads. Now let’s analyze the click-through rate based on the daily internet usage of the user:

fig = px.box(data, 
             x="Daily Internet Usage",  
             color="Clicked on Ad", 
             title="Click Through Rate based on Daily Internet Usage", 
             color_discrete_map={'Yes':'blue',
                                 'No':'red'})
fig.update_traces(quartilemethod="exclusive")
fig.show()
Click Through Rate based on Daily Internet Usage

From the above graph, we can see that the users with high internet usage click less on ads compared to the users with low internet usage. Now let’s analyze the click-through rate based on the age of the users:

fig = px.box(data, 
             x="Age",  
             color="Clicked on Ad", 
             title="Click Through Rate based on Age", 
             color_discrete_map={'Yes':'blue',
                                 'No':'red'})
fig.update_traces(quartilemethod="exclusive")
fig.show()
CTR based on age

From the above graph, we can see that users around 40 years click more on ads compared to users around 27-36 years old. Now let’s analyze the click-through rate based on the income of the users:

fig = px.box(data, 
             x="Area Income",  
             color="Clicked on Ad", 
             title="Click Through Rate based on Income", 
             color_discrete_map={'Yes':'blue',
                                 'No':'red'})
fig.update_traces(quartilemethod="exclusive")
fig.show()
CTR based on Income

There’s not much difference, but people from high-income areas click less on ads.

Calculating CTR of Ads

Now let’s calculate the overall Ads click-through rate. Here we need to calculate the ratio of users who clicked on the ad to users who left an impression on the ad. So let’s see the distribution of users:

data["Clicked on Ad"].value_counts()
No     5083
Yes    4917
Name: Clicked on Ad, dtype: int64

So 4917 out of 10000 users clicked on the ads. Let’s calculate the CTR:

click_through_rate = 4917 / 10000 * 100
print(click_through_rate)
49.17

So the CTR is 49.17.

Click Through Rate Prediction Model

Now let’s move on to training a Machine Learning model to predict click-through rate. I’ll start by dividing the data into training and testing sets:

data["Gender"] = data["Gender"].map({"Male": 1, 
                               "Female": 0})

x=data.iloc[:,0:7]
x=x.drop(['Ad Topic Line','City'],axis=1)
y=data.iloc[:,9]

from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(x,y,
                                           test_size=0.2,
                                           random_state=4)

Now let’s train the model using the random forecast classification algorithm:

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(x, y)

Now let’s have a look at the accuracy of the model:

from sklearn.metrics import accuracy_score
print(accuracy_score(ytest,y_pred))
0.9615

Now let’s test the model by making predictions:

print("Ads Click Through Rate Prediction : ")
a = float(input("Daily Time Spent on Site: "))
b = float(input("Age: "))
c = float(input("Area Income: "))
d = float(input("Daily Internet Usage: "))
e = input("Gender (Male = 1, Female = 0) : ")

features = np.array([[a, b, c, d, e]])
print("Will the user click on ad = ", model.predict(features))
Ads Click Through Rate Prediction : 
Daily Time Spent on Site: 62.26
Age: 28
Area Income: 61840.26
Daily Internet Usage: 207.17
Gender (Male = 1, Female = 0) : 0
Will the user click on ad =  ['No']

Summary

So this is how you can use Machine Learning for the task of Ads CTR prediction using Python. Ads CTR means predicting whether the user will click on the ad. In this task, we need to train a Machine Learning model to find relationships between the characteristics of all the users who click on ads. I hope you liked this article on Ads Click Through Rate prediction with Machine Learning using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1364

2 Comments

Leave a Reply