Data Visualization is the graphical representation of data to help individuals, including Data Scientists and decision-makers, comprehend, analyze, and draw insights from complex datasets. It involves using various visual elements such as charts, graphs, plots, and maps to present data in a comprehensible and meaningful manner. If you want to learn how to visualize data step by step, this article is for you. In this article, I’ll take you through a Data Visualization guide using Python.
Data Visualization Guide using Python
I will be using the plotly library in Python in this Data Visualization guide. To create a Data Visualization guide, we need a dataset containing numerical, categorical, and textual features to cover most of the Data Visualization concepts in this guide.
I found an ideal dataset for this Data Visualization guide. You can download the dataset from here.
Now, let’s get started with this Data Visualization Guide by importing the necessary Python libraries and the dataset:
import pandas as pd import plotly.express as px import plotly.graph_objects as go import plotly.io as pio pio.templates.default = "plotly_white" data = pd.read_csv("Instagram data.csv", encoding='latin-1') print(data.head())
Impressions From Home From Hashtags From Explore From Other Saves \ 0 3920 2586 1028 619 56 98 1 5394 2727 1838 1174 78 194 2 4021 2085 1188 0 533 41 3 4528 2700 621 932 73 172 4 2518 1704 255 279 37 96 Comments Shares Likes Profile Visits Follows \ 0 9 5 162 35 2 1 7 14 224 48 10 2 11 1 131 62 12 3 10 7 213 23 8 4 5 4 123 8 0 Caption \ 0 Here are some of the most important data visua... 1 Here are some of the best data science project... 2 Learn how to train a machine learning model an... 3 HereÂ’s how you can write a Python program to d... 4 Plotting annotations while visualizing your da... Hashtags 0 #finance #money #business #investing #investme... 1 #healthcare #health #covid #data #datascience ... 2 #data #datascience #dataanalysis #dataanalytic... 3 #python #pythonprogramming #pythonprojects #py... 4 #datavisualization #datascience #data #dataana...
Before creating visualizations, explore the dataset to understand its structure and characteristics:
# Display basic statistics print(data.describe())
Impressions From Home From Hashtags From Explore From Other \ count 119.000000 119.000000 119.000000 119.000000 119.000000 mean 5703.991597 2475.789916 1887.512605 1078.100840 171.092437 std 4843.780105 1489.386348 1884.361443 2613.026132 289.431031 min 1941.000000 1133.000000 116.000000 0.000000 9.000000 25% 3467.000000 1945.000000 726.000000 157.500000 38.000000 50% 4289.000000 2207.000000 1278.000000 326.000000 74.000000 75% 6138.000000 2602.500000 2363.500000 689.500000 196.000000 max 36919.000000 13473.000000 11817.000000 17414.000000 2547.000000 Saves Comments Shares Likes Profile Visits \ count 119.000000 119.000000 119.000000 119.000000 119.000000 mean 153.310924 6.663866 9.361345 173.781513 50.621849 std 156.317731 3.544576 10.089205 82.378947 87.088402 min 22.000000 0.000000 0.000000 72.000000 4.000000 25% 65.000000 4.000000 3.000000 121.500000 15.000000 50% 109.000000 6.000000 6.000000 151.000000 23.000000 75% 169.000000 8.000000 13.500000 204.000000 42.000000 max 1095.000000 19.000000 75.000000 549.000000 611.000000 Follows count 119.000000 mean 20.756303 std 40.921580 min 0.000000 25% 4.000000 50% 8.000000 75% 18.000000 max 260.000000
# Check for missing values print(data.isnull().sum())
Impressions 0 From Home 0 From Hashtags 0 From Explore 0 From Other 0 Saves 0 Comments 0 Shares 0 Likes 0 Profile Visits 0 Follows 0 Caption 0 Hashtags 0 dtype: int64
Creating Data Visualizations
Now, let’s start with creating data visualizations step by step. I’ll start by creating a bar chart to visualize the number of Impressions over time:
fig = px.bar(data, y='Impressions', title='Impressions Over Time') fig.update_xaxes(title_text='Days') fig.update_yaxes(title_text='Impressions') fig.show()

Now, let’s generate a scatter plot to explore the relationship between Likes and Comments:
fig = px.scatter(data, x='Likes', y='Comments', title='Likes vs. Comments') fig.update_xaxes(title_text='Likes') fig.update_yaxes(title_text='Comments') fig.show()

Now, let’s create a line chart to visualize the change in Profile Visits over time:
fig = px.line(data, y='Profile Visits', title='Profile Visits Over Time') fig.update_xaxes(title_text='Days') fig.update_yaxes(title_text='Profile Visits') fig.show()

Now, let’s create a histogram to visualize the distribution of Likes:
fig = px.histogram(data, x='Likes', title='Histogram of Likes') fig.update_xaxes(title_text='Likes') fig.update_yaxes(title_text='Frequency') fig.show()

Now, let’s generate a heatmap to visualize the correlation between numerical variables:
correlation_matrix = data.corr() fig = px.imshow(correlation_matrix, color_continuous_scale='Viridis', title='Correlation Heatmap') fig.update_xaxes(title_text='Features') fig.update_yaxes(title_text='Features') fig.show()

Now, let’s generate a pie chart to visualize reach from different sources:
reach_sources = ['From Home', 'From Hashtags', 'From Explore', 'From Other'] reach_counts = [data[source].sum() for source in reach_sources] colors = ['#FFB6C1', '#87CEFA', '#90EE90', '#FFDAB9'] fig = px.pie(data_frame=data, names=reach_sources, values=reach_counts, title='Reach from Different Sources', color_discrete_sequence=colors) fig.show()

Now, let’s visualize the relationship between profile visits and follows using a trend line:
fig = px.scatter(data, x='Profile Visits', y='Follows', trendline = 'ols', title='Profile Visits vs. Follows') fig.show()

Now, let’s create a combined visualization (known as subplots):
from plotly.subplots import make_subplots # Create subplots fig = make_subplots(rows=2, cols=2, subplot_titles=('Impressions Over Time', 'Likes vs. Comments', 'Profile Visits Over Time')) # Add bar chart fig.add_trace(px.bar(data, y='Impressions').data[0], row=1, col=1) # Add scatter plot fig.add_trace(px.scatter(data, x='Likes', y='Comments').data[0], row=1, col=2) # Add line chart fig.add_trace(px.line(data, y='Profile Visits').data[0], row=2, col=1) # Update axis labels and titles fig.update_xaxes(title_text='Days', row=1, col=1) fig.update_xaxes(title_text='Likes', row=1, col=2) fig.update_xaxes(title_text='Days', row=2, col=1) fig.update_yaxes(title_text='Impressions', row=1, col=1) fig.update_yaxes(title_text='Comments', row=1, col=2) fig.update_yaxes(title_text='Profile Visits', row=2, col=1) fig.update_layout(title='Combined Visualizations') fig.show()

Now, let’s analyze textual data. I’ll now create a bar chart to visualize the distribution of hashtags in the dataset:
# Create a list to store all hashtags all_hashtags = [] # Iterate through each row in the 'Hashtags' column for row in data['Hashtags']: hashtags = str(row).split() hashtags = [tag.strip() for tag in hashtags] all_hashtags.extend(hashtags) # Create a pandas DataFrame to store the hashtag distribution hashtag_distribution = pd.Series(all_hashtags).value_counts().reset_index() hashtag_distribution.columns = ['Hashtag', 'Count'] fig = px.bar(hashtag_distribution, x='Hashtag', y='Count', title='Distribution of Hashtags') fig.show()

So, this is how you can create data visualizations. Always remember that you can use any tool (Tableau or Power BI) or any library (Matplotlib, Seaborn or Plotly) for creating visualizations. The process for creating data visualizations remains the same with any tool.
Summary
Data Visualization is the graphical representation of data to help individuals, including Data Scientists and decision-makers, comprehend, analyze, and draw insights from complex datasets. It involves using various visual elements such as charts, graphs, plots, and maps to present data in a comprehensible and meaningful manner. I hope you liked this article on a Data Visualization using Python. Feel free to ask valuable questions in the comments section below.