App user segmentation is the task of grouping users based on how they engage with the app. It helps find retained users, find the user segment for a marketing campaign, and solve many other business problems where you need to search for users based on similar characteristics. So if you want to know how to segment app users, this article is for you. In this article, I will walk you through the task of app user segmentation with Machine Learning using Python.
App User Segmentation
In the problem of app user segmentation, we need to group users based on how they engage with the app. So to solve this problem, we need to have data about the users based on how they engage with the app.
I found a dataset containing data about how users who use an app daily and the users who have uninstalled the app engaged with the app. This data can be used to group users to find the retained and churn users. You can download the dataset from here.
In the section below, I’ll take you through the task of App User Segmentation with Machine Learning using the Python programming language.
App User Segmentation using Python
Now let’s start the task of App User Segmentation by importing the necessary Python libraries and the dataset:
import plotly.graph_objects as go import plotly.express as px import plotly.io as pio import pandas as pd pio.templates.default = "plotly_white" data = pd.read_csv("userbehaviour.csv") print(data.head())
userid Average Screen Time Average Spent on App (INR) Left Review \ 0 1001 17.0 634.0 1 1 1002 0.0 54.0 0 2 1003 37.0 207.0 0 3 1004 32.0 445.0 1 4 1005 45.0 427.0 1 Ratings New Password Request Last Visited Minutes Status 0 9 7 2990 Installed 1 4 8 24008 Uninstalled 2 8 5 971 Installed 3 6 2 799 Installed 4 5 6 3668 Installed
Let’s start by looking at the highest, lowest, and average screen time of all the users:
print(f'Average Screen Time = {data["Average Screen Time"].mean()}') print(f'Highest Screen Time = {data["Average Screen Time"].max()}') print(f'Lowest Screen Time = {data["Average Screen Time"].min()}')
Average Screen Time = 24.39039039039039 Highest Screen Time = 50.0 Lowest Screen Time = 0.0
Now let’s have a look at the highest, lowest, and the average amount spent by all the users:
print(f'Average Spend of the Users = {data["Average Spent on App (INR)"].mean()}') print(f'Highest Spend of the Users = {data["Average Spent on App (INR)"].max()}') print(f'Lowest Spend of the Users = {data["Average Spent on App (INR)"].min()}')
Average Spend of the Users = 424.4154154154154 Highest Spend of the Users = 998.0 Lowest Spend of the Users = 0.0
Now let’s have a look at the relationship between the spending capacity and screen time of the active users and the users who have uninstalled the app:
figure = px.scatter(data_frame = data, x="Average Screen Time", y="Average Spent on App (INR)", size="Average Spent on App (INR)", color= "Status", title = "Relationship Between Spending Capacity and Screentime", trendline="ols") figure.show()

So this is great! Users who uninstalled the app had an average screen time of fewer than 5 minutes a day, and the average spent was less than 100. We can also see a linear relationship between the average screen time and the average spending of the users still using the app.
Now let’s have a look at the relationship between the ratings given by users and the average screen time:
figure = px.scatter(data_frame = data, x="Average Screen Time", y="Ratings", size="Ratings", color= "Status", title = "Relationship Between Ratings and Screentime", trendline="ols") figure.show()

So we can see that users who uninstalled the app gave the app a maximum of five ratings. Their screen time is very low compared to users who rated more. So, this describes that users who don’t like to spend more time rate the app low and uninstall it at some point.
App User Segmentation to Find Retained and Lost Users
Now let’s move forward to App User segmentation to find the users that the app retained and lost forever. I will be using the K-means clustering algorithm in Machine Learning for this task:
clustering_data = data[["Average Screen Time", "Left Review", "Ratings", "Last Visited Minutes", "Average Spent on App (INR)", "New Password Request"]] from sklearn.preprocessing import MinMaxScaler for i in clustering_data.columns: MinMaxScaler(i) from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=3) clusters = kmeans.fit_predict(clustering_data) data["Segments"] = clusters print(data.head(10))
userid Average Screen Time Average Spent on App (INR) Left Review \ 0 1001 17.0 634.0 1 1 1002 0.0 54.0 0 2 1003 37.0 207.0 0 3 1004 32.0 445.0 1 4 1005 45.0 427.0 1 5 1006 28.0 599.0 0 6 1007 49.0 887.0 1 7 1008 8.0 31.0 0 8 1009 28.0 741.0 1 9 1010 28.0 524.0 1 Ratings New Password Request Last Visited Minutes Status Segments 0 9 7 2990 Installed 0 1 4 8 24008 Uninstalled 2 2 8 5 971 Installed 0 3 6 2 799 Installed 0 4 5 6 3668 Installed 0 5 9 4 2878 Installed 0 6 9 6 4481 Installed 0 7 2 1 1715 Installed 0 8 8 2 801 Installed 0 9 8 4 4621 Installed 0
Now let’s have a look at the number of segments we got:
print(data["Segments"].value_counts())
0 910 1 45 2 44 Name: Segments, dtype: int64
Now let’s rename the segments for a better understanding:
data["Segments"] = data["Segments"].map({0: "Retained", 1: "Churn", 2: "Needs Attention"})
Now let’s visualize the segments:
PLOT = go.Figure() for i in list(data["Segments"].unique()): PLOT.add_trace(go.Scatter(x = data[data["Segments"]== i]['Last Visited Minutes'], y = data[data["Segments"] == i]['Average Spent on App (INR)'], mode = 'markers',marker_size = 6, marker_line_width = 1, name = str(i))) PLOT.update_traces(hovertemplate='Last Visited Minutes: %{x} <br>Average Spent on App (INR): %{y}') PLOT.update_layout(width = 800, height = 800, autosize = True, showlegend = True, yaxis_title = 'Average Spent on App (INR)', xaxis_title = 'Last Visited Minutes', scene = dict(xaxis=dict(title = 'Last Visited Minutes', titlefont_color = 'black'), yaxis=dict(title = 'Average Spent on App (INR)', titlefont_color = 'black')))

The blue segment shows the segment of users the app has retained over time. The red segment indicates the segment of users who just uninstalled the app or are about to uninstall it soon. And the green segment indicates the segment of users that the application has lost.
Summary
So this is how you can segment users based on how they engage with the app. App users segmentation helps businesses find retained users, find the user segment for a marketing campaign, and solve many other business problems where you need to search for users based on similar characteristics. I hope you liked this article on App Users Segmentation with Machine Learning using Python. Feel free to ask valuable questions in the comments section below.