Business Forecasting using Python

Business Forecasting is one of the applications of Time Series Forecasting. In Business Forecasting, we aim to forecast future sales, expenditure, or revenue by using the historical Time Series data generated by the business. So, if you want to learn how to perform business forecasting, this article is for you. In this article, I will take you through the task of Business Forecasting using Python.

Why does a Business Needs Business Forecasting?

Every business is looking for strategies to improve its profits. Data science professionals play a major role in providing the most accurate predictions for any given time. The data generated by a company is always handy for analyzing the future behaviour of target customers. By predicting future business trends, a business can make better decisions to improve its future performance.

I hope you have understood why a business today needs to use business forecasting techniques. Forecasting sales, revenue or expenditure are some use cases of business forecasting. So, in the section below, I will take you through a task of business forecasting where we will aim to predict the quarterly revenue of Adidas. The data I am using for this task is collected manually from quarterly sales reports from Adidas. You can download the dataset from here.

Business Forecasting using Python

Let’s get started with the task of business forecasting by importing the necessary Python libraries and the dataset:

import pandas as pd
from datetime import date, timedelta
import datetime
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.tsa.arima_model import ARIMA
import statsmodels.api as sm
import warnings

data = pd.read_csv("adidas quarterly sales.csv")
print(data)
   Time Period  Revenue
0       2000Q1     1517
1       2000Q2     1248
2       2000Q3     1677
3       2000Q4     1393
4       2001Q1     1558
..         ...      ...
83      2020Q4     5142
84      2021Q1     5268
85      2021Q2     5077
86      2021Q3     5752
87      2021Q4     5137

[88 rows x 2 columns]

The dataset contains two columns; Time Period and Revenue. The Time Period column contains the quarterly revenue of Adidas from 2000 to 2021, and the Revenue column contains the sales revenue in millions (euros). Let’s have a look at the quarterly sales revenue of Adidas:

import plotly.express as px
figure = px.line(data, x="Time Period", 
                 y="Revenue", 
                 title='Quarterly Sales Revenue of Adidas in Millions')
figure.show()
Business Forecasting: Quarterly Sales Revenue of Adidas in Millions

The sales revenue data of Adidas is seasonal as the quarterly revenue increases and decreases every quarter. Below is how we can check the seasonality of any time series data:

result = seasonal_decompose(data["Revenue"], 
                            model='multiplicative', freq = 30)
fig = plt.figure()  
fig = result.plot()  
fig.set_size_inches(15, 10)
Seasonality of a Time Series Dataset

I will use the Seasonal ARIMA (SARIMA) model to forecast the quarterly sales revenue of Adidas. Before using the SARIMA model, it is necessary to find the p, d, and q values. You can learn how to find p, d, and q values from here.

As the data is not stationary, the value of d is 1. To find the values of p and q, we can use the autocorrelation and partial autocorrelation plots:

pd.plotting.autocorrelation_plot(data["Revenue"])
P Value = 5
plot_pacf(data["Revenue"], lags = 20)
Q Value = 2

Now here’s how to train a SARIMA model to predict the quarterly revenue of Adidas:

model=sm.tsa.statespace.SARIMAX(data['Revenue'],
                                order=(p, d, q),
                                seasonal_order=(p, d, q, 12))
model=model.fit()
print(model.summary())
                                     SARIMAX Results                                      
==========================================================================================
Dep. Variable:                            Revenue   No. Observations:                   88
Model:             SARIMAX(5, 1, 2)x(5, 1, 2, 12)   Log Likelihood                -548.520
Date:                            Mon, 05 Sep 2022   AIC                           1127.041
Time:                                    07:45:33   BIC                           1161.803
Sample:                                         0   HQIC                          1140.921
                                             - 88                                         
Covariance Type:                              opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1         -1.5796      0.391     -4.044      0.000      -2.345      -0.814
ar.L2         -1.4321      0.587     -2.438      0.015      -2.583      -0.281
ar.L3         -0.8305      0.626     -1.328      0.184      -2.057       0.396
ar.L4         -0.5179      0.821     -0.630      0.528      -2.128       1.092
ar.L5         -0.2655      0.491     -0.541      0.589      -1.228       0.697
ma.L1          1.5056      0.518      2.906      0.004       0.490       2.521
ma.L2          0.9697      0.623      1.557      0.120      -0.251       2.190
ar.S.L12      -1.1270    362.141     -0.003      0.998    -710.910     708.656
ar.S.L24      -1.3418    312.728     -0.004      0.997    -614.277     611.594
ar.S.L36      -0.7832    174.955     -0.004      0.996    -343.688     342.122
ar.S.L48      -0.1847     50.633     -0.004      0.997     -99.423      99.054
ar.S.L60      -0.0098      8.921     -0.001      0.999     -17.496      17.476
ma.S.L12       0.3046    362.082      0.001      0.999    -709.363     709.972
ma.S.L24       0.8602    221.641      0.004      0.997    -433.548     435.269
sigma2      1.909e+05   4.01e+05      0.476      0.634   -5.96e+05    9.78e+05
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):               427.98
Prob(Q):                              0.96   Prob(JB):                         0.00
Heteroskedasticity (H):               7.35   Skew:                            -2.04
Prob(H) (two-sided):                  0.00   Kurtosis:                        13.97
===================================================================================

Now let’s forecast the quarterly revenue of Adidas for the next eight quarters:

predictions = model.predict(len(data), len(data)+7)
print(predictions)
88    6078.793918
89    5186.311373
90    6293.196600
91    5751.905629
92    5911.946881
93    5499.784229
94    6389.627988
95    5728.806969
Name: predicted_mean, dtype: float64

Here’s how we can plot the predictions:

data["Revenue"].plot(legend=True, 
                     label="Training Data", 
                     figsize=(15, 10))
predictions.plot(legend=True, label="Predictions")
Business Forecasting: Final Predictions

Summary

So this is how you can perform business forecasting using the Python programming language. In Business Forecasting, we aim to forecast future sales, expenditure, or revenue by using the historical Time Series data generated by the business. I hope you liked this article on Business Forecasting using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

Data Strategist at Statso. My aim is to decode data science for the real world in the most simple words.

Articles: 1610

Leave a Reply

Discover more from thecleverprogrammer

Subscribe now to keep reading and get access to the full archive.

Continue reading