User Engagement Analysis is a data-driven approach to assess and understand user involvement, interaction, and satisfaction with a product, service, or platform. It involves analyzing various metrics and behavioural patterns to gain insights into user behaviour and preferences. It aids businesses in making informed decisions to enhance user experience, optimize marketing strategies, and improve overall product or service performance. If you want to learn how to perform User Engagement Analysis, this article is for you. In this article, I will take you through the task of User Engagement Analysis using Python.
User Engagement Analysis: Overview
User Engagement Analysis helps businesses understand how people interact with their products or services, allowing them to make improvements that make users happier and more likely to stick around. It helps businesses create better UI/UX for their customers and ultimately achieve their goals.
User Engagement Analysis helps various types of businesses, including e-commerce, social media, mobile apps, and online platforms. For example, an e-commerce company can use it to understand how customers navigate their website, what products they like, and how long they stay on each page. It helps the company optimize their website design, personalize product recommendations, and improve marketing strategies to increase customer satisfaction and loyalty.
For User Engagement Analysis, businesses need data that captures how users interact with their product, service, or platform. It includes information like the number of times users visit a website or app, the actions they take (such as clicks or purchases), how long they stay on a page or within a session, or any feedback they provide. I found an ideal dataset for the task of User Engagement Analysis. You can download the dataset from here.
User Engagement Analysis using Python
Now let’s get started with the task of User Engagement Analysis by importing the necessary Python libraries and the dataset:
import pandas as pd import plotly.express as px import plotly.io as pio import plotly.graph_objects as go pio.templates.default = "plotly_white" data = pd.read_csv("bounce rate.csv") print(data.head())
Client ID Sessions Avg. Session Duration Bounce Rate 0 5.778476e+08 367 00:01:35 87.19% 1 1.583822e+09 260 00:01:04 29.62% 2 1.030699e+09 237 00:00:02 99.16% 3 1.025030e+09 226 00:02:22 25.66% 4 1.469968e+09 216 00:01:23 46.76%
Let’s have a look at the null values before moving forward:
print(data.isnull().sum())
Client ID 0 Sessions 0 Avg. Session Duration 0 Bounce Rate 0 dtype: int64
Now let’s have a look at the column insights:
print(data.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 999 entries, 0 to 998 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Client ID 999 non-null float64 1 Sessions 999 non-null int64 2 Avg. Session Duration 999 non-null object 3 Bounce Rate 999 non-null object dtypes: float64(1), int64(1), object(2) memory usage: 31.3+ KB None
The Avg. Session Duration and Bounce Rate columns are not numerical. We need to convert them into appropriate data types for this task. Here’s how we can prepare our data:
data['Avg. Session Duration'] = data['Avg. Session Duration'].str[1:] data['Avg. Session Duration'] = pd.to_timedelta(data['Avg. Session Duration']) data['Avg. Session Duration'] = data['Avg. Session Duration'] / pd.Timedelta(minutes=1) data['Bounce Rate'] = data['Bounce Rate'].str.rstrip('%').astype('float') print(data)
Client ID Sessions Avg. Session Duration Bounce Rate 0 5.778476e+08 367 1.583333 87.19 1 1.583822e+09 260 1.066667 29.62 2 1.030699e+09 237 0.033333 99.16 3 1.025030e+09 226 2.366667 25.66 4 1.469968e+09 216 1.383333 46.76 .. ... ... ... ... 994 1.049263e+09 17 7.733333 41.18 995 1.145806e+09 17 5.616667 47.06 996 1.153811e+09 17 0.200000 94.12 997 1.182133e+09 17 1.216667 88.24 998 1.184187e+09 17 2.566667 64.71 [999 rows x 4 columns]
In the above code, we removed the first character from each value in the “Avg. Session Duration” column, representing a unit of time. Then, we converted the values in the “Avg. Session Duration” column to a standardized time delta format, representing time durations. Then, we further converted the time delta values to minutes, providing the average session duration in a numeric format. Similarly, we removed the percentage sign from each value in the “Bounce Rate” column and converted them to float values, representing the bounce rate as decimals.
Now let’s have a look at the descriptive statistics of the data:
print(data.describe())
Client ID Sessions Avg. Session Duration Bounce Rate count 9.990000e+02 999.000000 999.000000 999.000000 mean 1.036401e+09 32.259259 3.636520 65.307978 std 6.151503e+08 24.658588 4.040562 22.997270 min 1.849182e+05 17.000000 0.000000 4.880000 25% 4.801824e+08 21.000000 0.891667 47.370000 50% 1.029507e+09 25.000000 2.466667 66.670000 75% 1.587982e+09 35.000000 4.816667 85.190000 max 2.063338e+09 367.000000 30.666667 100.000000
Now let’s have a look at the correlation matrix before moving forward:
# Exclude 'Client Id' column from the dataset data_without_id = data.drop('Client ID', axis=1) # Calculate the correlation matrix correlation_matrix = data_without_id.corr() # Visualize the correlation matrix correlation_fig = px.imshow(correlation_matrix, labels=dict(x='Features', y='Features', color='Correlation')) correlation_fig.update_layout(title='Correlation Matrix') correlation_fig.show()

Analyzing Bounce Rates
Let’s analyze the bounce rate of users to understand user engagement. Bounce rate refers to the percentage of users who visit a website or webpage but leave without taking any further action or navigating to other pages within the same site. In simple terms, it measures the rate at which visitors bounce away from a website.
It is a helpful metric in user engagement analysis because it provides insights into user behaviour and the effectiveness of a website or webpage in capturing and retaining user interest. A high bounce rate often indicates that users are not finding what they expected or desired on the site. It can suggest issues such as poor user experience, irrelevant content, slow page load times, or misleading marketing campaigns. Here’s how to analyze User Engagement by analyzing the bounce rates of the users:
# Define the thresholds for high, medium, and low bounce rates high_bounce_rate_threshold = 70 low_bounce_rate_threshold = 30 # Segment the clients based on bounce rates data['Bounce Rate Segment'] = pd.cut(data['Bounce Rate'], bins=[0, low_bounce_rate_threshold, high_bounce_rate_threshold, 100], labels=['Low', 'Medium', 'High'], right=False) # Count the number of clients in each segment segment_counts = data['Bounce Rate Segment'].value_counts().sort_index() # Visualize the segments segment_fig = px.bar(segment_counts, labels={'index': 'Bounce Rate Segment', 'value': 'Number of Clients'}, title='Segmentation of Clients based on Bounce Rates') segment_fig.show()

We created bounce rate segments and analyzed the number of users in each segment in the above code. Now let’s have a look at the average session duration of the users in each bounce rate segment:
# Calculate the average session duration for each segment segment_avg_duration = data.groupby('Bounce Rate Segment')['Avg. Session Duration'].mean() # Create a bar chart to compare user engagement engagement_fig = go.Figure(data=go.Bar( x=segment_avg_duration.index, y=segment_avg_duration, text=segment_avg_duration.round(2), textposition='auto', marker=dict(color=['#2ECC40', '#FFDC00', '#FF4136']) )) engagement_fig.update_layout( title='Comparison of User Engagement by Bounce Rate Segment', xaxis=dict(title='Bounce Rate Segment'), yaxis=dict(title='Average Session Duration (minutes)'), ) engagement_fig.show()

So we can see that users with low bounce rates have an average session duration of about 9.05 minutes on the website, while users with high bounce rates have an average session duration of only 1.43 minutes.
Now let’s have a look at the top 10 loyal users according to the number of sessions and average session duration:
# Calculate the total session duration for each client data['Total Session Duration'] = data['Sessions'] * data['Avg. Session Duration'] # Sort the DataFrame by the total session duration in descending order df_sorted = data.sort_values('Total Session Duration', ascending=False) # the top 10 most loyal users df_sorted.head(10)

Now let’s have a look at the relationship between the average session duration and the bounce rates:
# Create a scatter plot to analyze the relationship between bounce rate and avg session duration scatter_fig = px.scatter(data, x='Bounce Rate', y='Avg. Session Duration', title='Relationship between Bounce Rate and Avg. Session Duration', trendline='ols') scatter_fig.update_layout( xaxis=dict(title='Bounce Rate'), yaxis=dict(title='Avg. Session Duration') ) scatter_fig.show()

So there is a negative linear relationship between the average session duration and bounce rates (which is ideal here). It means a high number of average session duration results in lower bounce rates.
Analyzing User Retention
Now let’s analyze user engagement by calculating the number of users the platform has retained so far. Retained users are those individuals who continue to use or engage with a product, service, or platform over a specific period of time. They are users who return and remain active or loyal to the offering after their initial interaction or sign-up.
Retained users demonstrate ongoing engagement, repeated usage, or continued interactions with the offering, indicating a level of satisfaction or value derived from the product or service. Businesses often focus on retaining users to drive growth, improve customer loyalty, and achieve sustainable success in the market.
Now here’s how we can create retention segments based on the number of sessions:
# Define the retention segments based on number of sessions def get_retention_segment(row): if row['Sessions'] >= 32: # 32 is mean of sessions return 'Frequent Users' else: return 'Occasional Users' # Create a new column for retention segments data['Retention Segment'] = data.apply(get_retention_segment, axis=1) # Print the updated DataFrame print(data)
Client ID Sessions Avg. Session Duration Bounce Rate \ 0 5.778476e+08 367 1.583333 87.19 1 1.583822e+09 260 1.066667 29.62 2 1.030699e+09 237 0.033333 99.16 3 1.025030e+09 226 2.366667 25.66 4 1.469968e+09 216 1.383333 46.76 .. ... ... ... ... 994 1.049263e+09 17 7.733333 41.18 995 1.145806e+09 17 5.616667 47.06 996 1.153811e+09 17 0.200000 94.12 997 1.182133e+09 17 1.216667 88.24 998 1.184187e+09 17 2.566667 64.71 Bounce Rate Segment Total Session Duration Retention Segment 0 High 581.083333 Frequent Users 1 Low 277.333333 Frequent Users 2 High 7.900000 Frequent Users 3 Low 534.866667 Frequent Users 4 Medium 298.800000 Frequent Users .. ... ... ... 994 Medium 131.466667 Occasional Users 995 Medium 95.483333 Occasional Users 996 High 3.400000 Occasional Users 997 High 20.683333 Occasional Users 998 Medium 43.633333 Occasional Users [999 rows x 7 columns]
The above function takes a row of data as input. It assigns a retention segment based on the number of sessions for each row. If the number of sessions is greater than or equal to 32 (which is the mean of sessions), the function returns ‘Frequent Users’. Otherwise, it returns ‘Occasional Users’.
Now let’s have a look at the average bounce rate by retention segment:
# Calculate the average bounce rate for each retention segment segment_bounce_rates = data.groupby('Retention Segment')['Bounce Rate'].mean().reset_index() # Create a bar chart to visualize the average bounce rates by retention segment bar_fig = px.bar(segment_bounce_rates, x='Retention Segment', y='Bounce Rate', title='Average Bounce Rate by Retention Segment', labels={'Retention Segment': 'Retention Segment', 'Bounce Rate': 'Average Bounce Rate'}) bar_fig.show()

So, there’s not much difference between the average bounce rates of frequent and occasional users. Now let’s have a look at the percentage of retained users:
# Count the number of users in each retention segment segment_counts = data['Retention Segment'].value_counts() # Define the pastel colors colors = ['#FFB6C1', '#87CEFA'] # Create a pie chart using Plotly fig = px.pie(segment_counts, values=segment_counts.values, names=segment_counts.index, color=segment_counts.index, color_discrete_sequence=colors, title='User Retention Rate') # Update layout and show the chart fig.update_traces(textposition='inside', textinfo='percent+label') fig.update_layout(showlegend=False) fig.show()

In the data of 1000 users, the platform retained 29.7% of users (297 users) who frequently visit the platform. This retention rate is not bad at all. So, this is how you can perform User Engagement Analysis using Python.
Summary
User Engagement Analysis helps businesses understand how people interact with their products or services, allowing them to make improvements that make users happier and more likely to stick around. It helps businesses create better UI/UX for their customers and ultimately achieve their goals. I hope you liked this article on User Engagement Analysis using Python. Feel free to ask valuable questions in the comments section below.
Amazing content. It’s so unique and easy to learn concept the way you brought them together with very fine explanation.