Instagram Reach Forecasting using Python

Not everyone is always available on social media all the time. Some people limit the use of social media during festive seasons, while some avoid social media during their examinations. So, as Content creators, we need to decide when to make the most valuable piece of content and when not. That is where Instagram Reach Forecasting can help content creators and everyone who uses Instagram professionally. In this article, I will take you through the task of Instagram Reach Forecasting using Python.

Instagram Reach Forecasting

Instagram reach forecasting is the process of predicting the number of people that an Instagram post, story, or other content will be reached, based on historical data and various other factors.

For content creators and anyone using Instagram professionally, predicting the reach can be valuable for planning and optimizing their social media strategy. By understanding how their content is performing, creators can make informed decisions about when to publish, what types of content to create, and how to engage their audience. It can lead to increased engagement, better performance metrics, and ultimately, greater success on the platform.

For the task of Instagram Reach Forecasting, we need to have data about Instagram reach for a particular time period. I found an ideal dataset for this task that you can download here.

In the section below, I will take you through the task of Instagram Reach Forecasting using Python.

Instagram Reach Forecasting using Python

Let’s start this task by importing the necessary Python libraries and the dataset:

import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_white"

data = pd.read_csv("Instagram-Reach.csv", encoding = 'latin-1')
print(data.head())
                  Date  Instagram reach
0  2022-04-01T00:00:00             7620
1  2022-04-02T00:00:00            12859
2  2022-04-03T00:00:00            16008
3  2022-04-04T00:00:00            24349
4  2022-04-05T00:00:00            20532

I’ll convert the Date column into datetime datatype to move forward:

data['Date'] = pd.to_datetime(data['Date'])
print(data.head())
        Date  Instagram reach
0 2022-04-01             7620
1 2022-04-02            12859
2 2022-04-03            16008
3 2022-04-04            24349
4 2022-04-05            20532

Analyzing Reach

Let’s analyze the trend of Instagram reach over time using a line chart:

fig = go.Figure()
fig.add_trace(go.Scatter(x=data['Date'], 
                         y=data['Instagram reach'], 
                         mode='lines', name='Instagram reach'))
fig.update_layout(title='Instagram Reach Trend', xaxis_title='Date', 
                  yaxis_title='Instagram Reach')
fig.show()
Instagram Reach Trend

Now let’s analyze Instagram reach for each day using a bar chart:

fig = go.Figure()
fig.add_trace(go.Bar(x=data['Date'], 
                     y=data['Instagram reach'], 
                     name='Instagram reach'))
fig.update_layout(title='Instagram Reach by Day', 
                  xaxis_title='Date', 
                  yaxis_title='Instagram Reach')
fig.show()
Instagram Reach by Day

Now let’s analyze the distribution of Instagram reach using a box plot:

fig = go.Figure()
fig.add_trace(go.Box(y=data['Instagram reach'], 
                     name='Instagram reach'))
fig.update_layout(title='Instagram Reach Box Plot', 
                  yaxis_title='Instagram Reach')
fig.show()
Instagram Reach Box Plot

Now let’s create a day column and analyze reach based on the days of the week. To create a day column, we can use the dt.day_name() method to extract the day of the week from the Date column:

data['Day'] = data['Date'].dt.day_name()
print(data.head())
        Date  Instagram reach       Day
0 2022-04-01             7620    Friday
1 2022-04-02            12859  Saturday
2 2022-04-03            16008    Sunday
3 2022-04-04            24349    Monday
4 2022-04-05            20532   Tuesday

Now let’s analyze the reach based on the days of the week. For this, we can group the DataFrame by the Day column and calculate the mean, median, and standard deviation of the Instagram reach column for each day:

import numpy as np

day_stats = data.groupby('Day')['Instagram reach'].agg(['mean', 'median', 'std']).reset_index()
print(day_stats)
         Day          mean   median           std
0     Friday  46666.849057  35574.0  29856.943036
1     Monday  52621.692308  46853.0  32296.071347
2   Saturday  47374.750000  40012.0  27667.043634
3     Sunday  53114.173077  47797.0  30906.162384
4   Thursday  48570.923077  39150.0  28623.220625
5    Tuesday  54030.557692  48786.0  32503.726482
6  Wednesday  51017.269231  42320.5  29047.869685

Now, let’s create a bar chart to visualize the reach for each day of the week:

fig = go.Figure()
fig.add_trace(go.Bar(x=day_stats['Day'], 
                     y=day_stats['mean'], 
                     name='Mean'))
fig.add_trace(go.Bar(x=day_stats['Day'], 
                     y=day_stats['median'], 
                     name='Median'))
fig.add_trace(go.Bar(x=day_stats['Day'], 
                     y=day_stats['std'], 
                     name='Standard Deviation'))
fig.update_layout(title='Instagram Reach by Day of the Week', 
                  xaxis_title='Day', 
                  yaxis_title='Instagram Reach')
fig.show()
Instagram Reach by Day of the Week

Instagram Reach Forecasting using Time Series Forecasting

To forecast reach, we can use Time Series Forecasting. Let’s see how to use Time Series Forecasting to forecast the reach of my Instagram account step-by-step.

Let’s look at the Trends and Seasonal patterns of Instagram reach:

from plotly.tools import mpl_to_plotly
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

data = data[["Date", "Instagram reach"]]

result = seasonal_decompose(data['Instagram reach'], 
                            model='multiplicative', 
                            period=100)

fig = plt.figure()
fig = result.plot()

fig = mpl_to_plotly(fig)
fig.show()
Seasonality for Time Series

The reach is affected by seasonality, so we can use the SARIMA model to forecast the reach of the Instagram account. We need to find p, d, and q values to forecast the reach of Instagram. To find the value of d, we can use the autocorrelation plot, and to find the value of q, we can use a partial autocorrelation plot. The value of d will be 1. You can learn more about finding these values here.

Now here’s how to visualize an autocorrelation plot to find the value of p:

pd.plotting.autocorrelation_plot(data["Instagram reach"])
Partial Autocorrelation for Time Series
P = 8

And now here’s how to visualize a partial autocorrelation plot to find the value of q:

from statsmodels.graphics.tsaplots import plot_pacf
plot_pacf(data["Instagram reach"], lags = 100)
Partial Autocorrelation for Time Series
q = 2

Now here’s how to train a model using SARIMA:

p, d, q = 8, 1, 2

import statsmodels.api as sm
import warnings
model=sm.tsa.statespace.SARIMAX(data['Instagram reach'],
                                order=(p, d, q),
                                seasonal_order=(p, d, q, 12))
model=model.fit()
print(model.summary())
                                     SARIMAX Results                                      
==========================================================================================
Dep. Variable:                    Instagram reach   No. Observations:                  365
Model:             SARIMAX(8, 1, 2)x(8, 1, 2, 12)   Log Likelihood               -3938.515
Date:                            Mon, 24 Apr 2023   AIC                           7919.031
Time:                                    03:57:47   BIC                           8000.167
Sample:                                         0   HQIC                          7951.319
                                            - 365                                         
Covariance Type:                              opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.1913      6.555      0.029      0.977     -12.657      13.040
ar.L2          0.4707      6.092      0.077      0.938     -11.469      12.411
ar.L3         -0.1190      1.403     -0.085      0.932      -2.868       2.630
ar.L4          0.0424      0.259      0.164      0.870      -0.465       0.550
ar.L5         -0.0213      0.189     -0.113      0.910      -0.393       0.350
ar.L6          0.0317      0.271      0.117      0.907      -0.499       0.562
ar.L7          0.0084      0.424      0.020      0.984      -0.823       0.840
ar.L8         -0.0139      0.242     -0.057      0.954      -0.488       0.460
ma.L1         -0.2250      6.551     -0.034      0.973     -13.066      12.616
ma.L2         -0.7081      6.290     -0.113      0.910     -13.037      11.621
ar.S.L12      -1.0857      1.529     -0.710      0.478      -4.082       1.911
ar.S.L24      -1.7461      2.231     -0.783      0.434      -6.118       2.626
ar.S.L36      -1.4312      1.916     -0.747      0.455      -5.186       2.323
ar.S.L48      -1.0845      1.562     -0.694      0.488      -4.147       1.978
ar.S.L60      -0.7839      1.114     -0.704      0.481      -2.967       1.399
ar.S.L72      -0.4491      0.789     -0.569      0.569      -1.995       1.097
ar.S.L84      -0.2227      0.504     -0.442      0.659      -1.211       0.765
ar.S.L96      -0.0539      0.246     -0.219      0.827      -0.536       0.428
ma.S.L12       0.2244      1.530      0.147      0.883      -2.774       3.223
ma.S.L24       0.8247      1.275      0.647      0.518      -1.674       3.324
sigma2      4.863e+08   1.39e-07    3.5e+15      0.000    4.86e+08    4.86e+08
===================================================================================
Ljung-Box (L1) (Q):                   0.01   Jarque-Bera (JB):               214.00
Prob(Q):                              0.93   Prob(JB):                         0.00
Heteroskedasticity (H):               0.71   Skew:                             0.29
Prob(H) (two-sided):                  0.07   Kurtosis:                         6.78
===================================================================================

Now let’s make predictions using the model and have a look at the forecasted reach:

predictions = model.predict(len(data), len(data)+100)

trace_train = go.Scatter(x=data.index, 
                         y=data["Instagram reach"], 
                         mode="lines", 
                         name="Training Data")
trace_pred = go.Scatter(x=predictions.index, 
                        y=predictions, 
                        mode="lines", 
                        name="Predictions")

layout = go.Layout(title="Instagram Reach Time Series and Predictions", 
                   xaxis_title="Date", 
                   yaxis_title="Instagram Reach")

fig = go.Figure(data=[trace_train, trace_pred], layout=layout)
fig.show()
Instagram Reach Forecasting

So this is how we can forecast the reach of an Instagram account using Time Series Forecasting.

Summary

Instagram reach prediction is the process of predicting the number of people that an Instagram post, story, or other content will be reached, based on historical data and various other factors. I hope you liked this article on Instagram Reach Forecasting using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1435

Leave a Reply