There is a lot of competition among the brands in the smartwatch industry. Smartwatches are preferred by people who like to take care of their fitness. Analyzing the data collected on your fitness is one of the use cases of Data Science in healthcare. So if you want to learn how to analyze smartwatch fitness data, this article is for you. In this article, I will take you through the task of Smartwatch Data Analysis using Python.
Smartwatch Data Analysis using Python
The dataset I am using for Smartwatch data analysis is publicly available on Kaggle. This dataset was initially collected from 30 female users of the Fitbit smartwatch. You can download the dataset from here.
Now I will start the task of Smartwatch Data Analysis by importing the necessary Python libraries and the dataset:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import plotly.express as px import plotly.graph_objects as go data = pd.read_csv("dailyActivity_merged.csv") print(data.head())
Id ActivityDate TotalSteps TotalDistance TrackerDistance \ 0 1503960366 4/12/2016 13162 8.50 8.50 1 1503960366 4/13/2016 10735 6.97 6.97 2 1503960366 4/14/2016 10460 6.74 6.74 3 1503960366 4/15/2016 9762 6.28 6.28 4 1503960366 4/16/2016 12669 8.16 8.16 LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance \ 0 0.0 1.88 0.55 1 0.0 1.57 0.69 2 0.0 2.44 0.40 3 0.0 2.14 1.26 4 0.0 2.71 0.41 LightActiveDistance SedentaryActiveDistance VeryActiveMinutes \ 0 6.06 0.0 25 1 4.71 0.0 21 2 3.91 0.0 30 3 2.83 0.0 29 4 5.04 0.0 36 FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories 0 13 328 728 1985 1 19 217 776 1797 2 11 181 1218 1776 3 34 209 726 1745 4 10 221 773 1863
Before moving forward, let’s have a look at whether this dataset has any null values or not:
print(data.isnull().sum())
Id 0 ActivityDate 0 TotalSteps 0 TotalDistance 0 TrackerDistance 0 LoggedActivitiesDistance 0 VeryActiveDistance 0 ModeratelyActiveDistance 0 LightActiveDistance 0 SedentaryActiveDistance 0 VeryActiveMinutes 0 FairlyActiveMinutes 0 LightlyActiveMinutes 0 SedentaryMinutes 0 Calories 0 dtype: int64
So the dataset does not have any null values. Let’s have a look at the information about columns in the dataset:
print(data.info())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 940 entries, 0 to 939 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Id 940 non-null int64 1 ActivityDate 940 non-null object 2 TotalSteps 940 non-null int64 3 TotalDistance 940 non-null float64 4 TrackerDistance 940 non-null float64 5 LoggedActivitiesDistance 940 non-null float64 6 VeryActiveDistance 940 non-null float64 7 ModeratelyActiveDistance 940 non-null float64 8 LightActiveDistance 940 non-null float64 9 SedentaryActiveDistance 940 non-null float64 10 VeryActiveMinutes 940 non-null int64 11 FairlyActiveMinutes 940 non-null int64 12 LightlyActiveMinutes 940 non-null int64 13 SedentaryMinutes 940 non-null int64 14 Calories 940 non-null int64 dtypes: float64(7), int64(7), object(1) memory usage: 110.3+ KB None
The column containing the date of the record is an object. We may need to use dates in our analysis, so let’s convert this column into a datetime column:
# Changing datatype of ActivityDate data["ActivityDate"] = pd.to_datetime(data["ActivityDate"], format="%m/%d/%Y") print(data.info())
Look at all the columns; you will see information about very active, fairly active, lightly active, and sedentary minutes in the dataset. Let’s combine all these columns as total minutes before moving forward:
data["TotalMinutes"] = data["VeryActiveMinutes"] + data["FairlyActiveMinutes"] + data["LightlyActiveMinutes"] + data["SedentaryMinutes"] print(data["TotalMinutes"].sample(5))
742 1440 858 1440 683 1054 272 1440 268 1440 Name: TotalMinutes, dtype: int64
Now let’s have a look at the descriptive statistics of the dataset:
print(data.describe())
Id TotalSteps TotalDistance TrackerDistance \ count 9.400000e+02 940.000000 940.000000 940.000000 mean 4.855407e+09 7637.910638 5.489702 5.475351 std 2.424805e+09 5087.150742 3.924606 3.907276 min 1.503960e+09 0.000000 0.000000 0.000000 25% 2.320127e+09 3789.750000 2.620000 2.620000 50% 4.445115e+09 7405.500000 5.245000 5.245000 75% 6.962181e+09 10727.000000 7.712500 7.710000 max 8.877689e+09 36019.000000 28.030001 28.030001 LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance \ count 940.000000 940.000000 940.000000 mean 0.108171 1.502681 0.567543 std 0.619897 2.658941 0.883580 min 0.000000 0.000000 0.000000 25% 0.000000 0.000000 0.000000 50% 0.000000 0.210000 0.240000 75% 0.000000 2.052500 0.800000 max 4.942142 21.920000 6.480000 LightActiveDistance SedentaryActiveDistance VeryActiveMinutes \ count 940.000000 940.000000 940.000000 mean 3.340819 0.001606 21.164894 std 2.040655 0.007346 32.844803 min 0.000000 0.000000 0.000000 25% 1.945000 0.000000 0.000000 50% 3.365000 0.000000 4.000000 75% 4.782500 0.000000 32.000000 max 10.710000 0.110000 210.000000 FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes \ count 940.000000 940.000000 940.000000 mean 13.564894 192.812766 991.210638 std 19.987404 109.174700 301.267437 min 0.000000 0.000000 0.000000 25% 0.000000 127.000000 729.750000 50% 6.000000 199.000000 1057.500000 75% 19.000000 264.000000 1229.500000 max 143.000000 518.000000 1440.000000 Calories TotalMinutes count 940.000000 940.000000 mean 2303.609574 1218.753191 std 718.166862 265.931767 min 0.000000 2.000000 25% 1828.500000 989.750000 50% 2134.000000 1440.000000 75% 2793.250000 1440.000000 max 4900.000000 1440.000000
Let’s Analyze the Smartwatch Data⌚️
The dataset has a “Calories” column; it contains the data about the number of calories burned in a day. Let’s have a look at the relationship between calories burned and the total steps walked in a day:
figure = px.scatter(data_frame = data, x="Calories", y="TotalSteps", size="VeryActiveMinutes", trendline="ols", title="Relationship between Calories & Total Steps") figure.show()

You can see that there is a linear relationship between the total number of steps and the number of calories burned in a day. Now let’s look at the average total number of active minutes in a day:
label = ["Very Active Minutes", "Fairly Active Minutes", "Lightly Active Minutes", "Inactive Minutes"] counts = data[["VeryActiveMinutes", "FairlyActiveMinutes", "LightlyActiveMinutes", "SedentaryMinutes"]].mean() colors = ['gold','lightgreen', "pink", "blue"] fig = go.Figure(data=[go.Pie(labels=label, values=counts)]) fig.update_layout(title_text='Total Active Minutes') fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30, marker=dict(colors=colors, line=dict(color='black', width=3))) fig.show()

Observations:
- 81.3% of Total inactive minutes in a day
- 15.8% of Lightly active minutes in a day
- On an average, only 21 minutes (1.74%) were very active
- and 1.11% (13 minutes) of fairly active minutes in a day
We transformed the data type of the ActivityDate column to the datetime column above. Let’s use it to find the weekdays of the records and add a new column to this dataset as “Day”:
data["Day"] = data["ActivityDate"].dt.day_name() print(data["Day"].head())
0 Tuesday 1 Wednesday 2 Thursday 3 Friday 4 Saturday Name: Day, dtype: object
Now let’s have a look at the very active, fairly active, and lightly active minutes on each day of the week:
fig = go.Figure() fig.add_trace(go.Bar( x=data["Day"], y=data["VeryActiveMinutes"], name='Very Active', marker_color='purple' )) fig.add_trace(go.Bar( x=data["Day"], y=data["FairlyActiveMinutes"], name='Fairly Active', marker_color='green' )) fig.add_trace(go.Bar( x=data["Day"], y=data["LightlyActiveMinutes"], name='Lightly Active', marker_color='pink' )) fig.update_layout(barmode='group', xaxis_tickangle=-45) fig.show()

Now let’s have a look at the number of inactive minutes on each day of the week:
day = data["Day"].value_counts() label = day.index counts = data["SedentaryMinutes"] colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"] fig = go.Figure(data=[go.Pie(labels=label, values=counts)]) fig.update_layout(title_text='Inactive Minutes Daily') fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30, marker=dict(colors=colors, line=dict(color='black', width=3))) fig.show()

So Thursday is the most inactive day according to the lifestyle of all the individuals in the dataset. Now let’s have a look at the number of calories burned on each day of the week:
calories = data["Day"].value_counts() label = calories.index counts = data["Calories"] colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"] fig = go.Figure(data=[go.Pie(labels=label, values=counts)]) fig.update_layout(title_text='Calories Burned Daily') fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=30, marker=dict(colors=colors, line=dict(color='black', width=3))) fig.show()

Tuesday is, therefore, one of the most active days for all individuals in the dataset, as the highest number of calories were burned on Tuesdays.
So this is how you can analyze smartwatch data using the Python programming language. There is a lot more you can do with this dataset. You can also use it for predicting the number of calories burned in a day.
Summary
So this is how you can analyze the data collected by a smartwatch about fitness using Python. Smartwatches are preferred by people who like to take care of their fitness. Analyzing the data collected on your fitness is one of the use cases of Data Science in healthcare. I hope you liked this article on Smartwatch data analysis using Python. Feel free to ask valuable questions in the comments section below.
Hi Aman,
I tried last two codes with plotly express but its showing ValueError.
“ValueError: All arguments should have the same length. The length of argument `values` is 940, whereas the length of previously-processed arguments [‘names’] is 7”
Below is my code for 2nd last pie chart:
label = df[‘Days’].value_counts().index
counts = df[‘SedentaryMinutes’]
figs = px.pie(df, values= counts, names= label, title= ‘Number of Inactive Minutes in each day of week’)
figs.show()
Your code is correct, check if the spelling of the column is Days or Day.