Data Visualization Guide using Python

Data Visualization is the graphical representation of data to help individuals, including Data Scientists and decision-makers, comprehend, analyze, and draw insights from complex datasets. It involves using various visual elements such as charts, graphs, plots, and maps to present data in a comprehensible and meaningful manner. If you want to learn how to visualize data step by step, this article is for you. In this article, I’ll take you through a Data Visualization guide using Python.

Data Visualization Guide using Python

I will be using the plotly library in Python in this Data Visualization guide. To create a Data Visualization guide, we need a dataset containing numerical, categorical, and textual features to cover most of the Data Visualization concepts in this guide.

I found an ideal dataset for this Data Visualization guide. You can download the dataset from here.

Now, let’s get started with this Data Visualization Guide by importing the necessary Python libraries and the dataset:

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"

data = pd.read_csv("Instagram data.csv", encoding='latin-1')
print(data.head())
   Impressions  From Home  From Hashtags  From Explore  From Other  Saves  \
0         3920       2586           1028           619          56     98   
1         5394       2727           1838          1174          78    194   
2         4021       2085           1188             0         533     41   
3         4528       2700            621           932          73    172   
4         2518       1704            255           279          37     96   

   Comments  Shares  Likes  Profile Visits  Follows  \
0         9       5    162              35        2   
1         7      14    224              48       10   
2        11       1    131              62       12   
3        10       7    213              23        8   
4         5       4    123               8        0   

                                             Caption  \
0  Here are some of the most important data visua...   
1  Here are some of the best data science project...   
2  Learn how to train a machine learning model an...   
3  Here’s how you can write a Python program to d...   
4  Plotting annotations while visualizing your da...   

                                            Hashtags  
0  #finance #money #business #investing #investme...  
1  #healthcare #health #covid #data #datascience ...  
2  #data #datascience #dataanalysis #dataanalytic...  
3  #python #pythonprogramming #pythonprojects #py...  
4  #datavisualization #datascience #data #dataana...  

Before creating visualizations, explore the dataset to understand its structure and characteristics:

# Display basic statistics
print(data.describe())
        Impressions     From Home  From Hashtags  From Explore   From Other  \
count    119.000000    119.000000     119.000000    119.000000   119.000000   
mean    5703.991597   2475.789916    1887.512605   1078.100840   171.092437   
std     4843.780105   1489.386348    1884.361443   2613.026132   289.431031   
min     1941.000000   1133.000000     116.000000      0.000000     9.000000   
25%     3467.000000   1945.000000     726.000000    157.500000    38.000000   
50%     4289.000000   2207.000000    1278.000000    326.000000    74.000000   
75%     6138.000000   2602.500000    2363.500000    689.500000   196.000000   
max    36919.000000  13473.000000   11817.000000  17414.000000  2547.000000   

             Saves    Comments      Shares       Likes  Profile Visits  \
count   119.000000  119.000000  119.000000  119.000000      119.000000   
mean    153.310924    6.663866    9.361345  173.781513       50.621849   
std     156.317731    3.544576   10.089205   82.378947       87.088402   
min      22.000000    0.000000    0.000000   72.000000        4.000000   
25%      65.000000    4.000000    3.000000  121.500000       15.000000   
50%     109.000000    6.000000    6.000000  151.000000       23.000000   
75%     169.000000    8.000000   13.500000  204.000000       42.000000   
max    1095.000000   19.000000   75.000000  549.000000      611.000000   

          Follows  
count  119.000000  
mean    20.756303  
std     40.921580  
min      0.000000  
25%      4.000000  
50%      8.000000  
75%     18.000000  
max    260.000000  
# Check for missing values
print(data.isnull().sum())
Impressions       0
From Home         0
From Hashtags     0
From Explore      0
From Other        0
Saves             0
Comments          0
Shares            0
Likes             0
Profile Visits    0
Follows           0
Caption           0
Hashtags          0
dtype: int64

Creating Data Visualizations

Now, let’s start with creating data visualizations step by step. I’ll start by creating a bar chart to visualize the number of Impressions over time:

fig = px.bar(data, y='Impressions', title='Impressions Over Time')
fig.update_xaxes(title_text='Days')
fig.update_yaxes(title_text='Impressions')
fig.show()
Data Visualization Guide: bar chart

Now, let’s generate a scatter plot to explore the relationship between Likes and Comments:

fig = px.scatter(data, x='Likes', y='Comments', title='Likes vs. Comments')
fig.update_xaxes(title_text='Likes')
fig.update_yaxes(title_text='Comments')
fig.show()
scatter plot

Now, let’s create a line chart to visualize the change in Profile Visits over time:

fig = px.line(data, y='Profile Visits', title='Profile Visits Over Time')
fig.update_xaxes(title_text='Days')
fig.update_yaxes(title_text='Profile Visits')
fig.show()
Data Visualization Guide: Line Chart

Now, let’s create a histogram to visualize the distribution of Likes:

fig = px.histogram(data, x='Likes', title='Histogram of Likes')
fig.update_xaxes(title_text='Likes')
fig.update_yaxes(title_text='Frequency')
fig.show()
Histogram

Now, let’s generate a heatmap to visualize the correlation between numerical variables:

correlation_matrix = data.corr()
fig = px.imshow(correlation_matrix, color_continuous_scale='Viridis', title='Correlation Heatmap')
fig.update_xaxes(title_text='Features')
fig.update_yaxes(title_text='Features')
fig.show()
Data Visualization Guide: Heatmap

Now, let’s generate a pie chart to visualize reach from different sources:

reach_sources = ['From Home', 'From Hashtags', 'From Explore', 'From Other']
reach_counts = [data[source].sum() for source in reach_sources]

colors = ['#FFB6C1', '#87CEFA', '#90EE90', '#FFDAB9']

fig = px.pie(data_frame=data, names=reach_sources, 
             values=reach_counts, 
             title='Reach from Different Sources',
             color_discrete_sequence=colors)
fig.show()
Pie Chart

Now, let’s visualize the relationship between profile visits and follows using a trend line:

fig = px.scatter(data, 
                 x='Profile Visits', 
                 y='Follows', 
                 trendline = 'ols',
                 title='Profile Visits vs. Follows')
fig.show()
Data Visualization Guide: Scatter Plot with trend line

Now, let’s create a combined visualization (known as subplots):

from plotly.subplots import make_subplots

# Create subplots
fig = make_subplots(rows=2, cols=2, subplot_titles=('Impressions Over Time', 'Likes vs. Comments', 'Profile Visits Over Time'))

# Add bar chart
fig.add_trace(px.bar(data, y='Impressions').data[0], row=1, col=1)

# Add scatter plot
fig.add_trace(px.scatter(data, x='Likes', y='Comments').data[0], row=1, col=2)

# Add line chart
fig.add_trace(px.line(data, y='Profile Visits').data[0], row=2, col=1)

# Update axis labels and titles
fig.update_xaxes(title_text='Days', row=1, col=1)
fig.update_xaxes(title_text='Likes', row=1, col=2)
fig.update_xaxes(title_text='Days', row=2, col=1)
fig.update_yaxes(title_text='Impressions', row=1, col=1)
fig.update_yaxes(title_text='Comments', row=1, col=2)
fig.update_yaxes(title_text='Profile Visits', row=2, col=1)
fig.update_layout(title='Combined Visualizations')
fig.show()
subplots

Now, let’s analyze textual data. I’ll now create a bar chart to visualize the distribution of hashtags in the dataset:

# Create a list to store all hashtags
all_hashtags = []

# Iterate through each row in the 'Hashtags' column
for row in data['Hashtags']:
    hashtags = str(row).split()
    hashtags = [tag.strip() for tag in hashtags]
    all_hashtags.extend(hashtags)

# Create a pandas DataFrame to store the hashtag distribution
hashtag_distribution = pd.Series(all_hashtags).value_counts().reset_index()
hashtag_distribution.columns = ['Hashtag', 'Count']

fig = px.bar(hashtag_distribution, x='Hashtag', y='Count', title='Distribution of Hashtags')
fig.show()
Data Visualization Guide: bar plot for hashtags

So, this is how you can create data visualizations. Always remember that you can use any tool (Tableau or Power BI) or any library (Matplotlib, Seaborn or Plotly) for creating visualizations. The process for creating data visualizations remains the same with any tool.

Summary

Data Visualization is the graphical representation of data to help individuals, including Data Scientists and decision-makers, comprehend, analyze, and draw insights from complex datasets. It involves using various visual elements such as charts, graphs, plots, and maps to present data in a comprehensible and meaningful manner. I hope you liked this article on a Data Visualization using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1536

Leave a Reply