Time Series Analysis and Forecasting with Python

Time series analysis

Time Series Analysis carries methods to research time-series statistics to extract statistical features from the data. Time Series Forecasting is used in training a Machine learning model to predict future values with the usage of historical importance.

Time Series Analysis is broadly speaking used in training machine learning models for the Economy, Weather forecasting, stock price prediction, and additionally in Sales forecasting.

It can be said that Time Series Analysis is widely used in facts based on non-stationary features.

Time Series Analysis and Forecasting with Python

In this article, I will use different methods for sales forecasting using the time series analysis with python. You can download the dataset that I have used in this article below.

Let’s start with this tutorial on Time Series Forecasting using Python by importing the libraries.

import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.api as sm
import matplotlib
matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'Code language: Python (python)

There are different categories in the dataset, lets start from time series analysis and sales forecasting of furniture.

df = pd.read_excel("Superstore.xls")
furniture = df.loc[df['Category'] == 'Furniture']
furniture['Order Date'].min(), furniture['Order Date'].max()Code language: Python (python)

Timestamp(‚Äė2014‚Äď01‚Äď06 00:00:00‚Äô), Timestamp(‚Äė2017‚Äď12‚Äď30 00:00:00‚Äô)

Data Preprocessing

Data Preprocessing includes removing columns that we don’t need, looking for missing values, etc.

cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 
        'Customer ID', 'Customer Name', 
        'Segment', 'Country', 'City', 'State', 
        'Postal Code', 'Region', 'Product ID', 
        'Category', 'Sub-Category', 'Product Name', 
        'Quantity', 'Discount', 'Profit']
furniture.drop(cols, axis=1, inplace=True)
furniture = furniture.sort_values('Order Date')
furniture.isnull().sum()Code language: Python (python)
furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()Code language: Python (python)

Indexing Time Series Data

furniture = furniture.set_index('Order Date')

The current DateTime looks a little challenging to work within the dataset, so I will use the price of each day sales on average of the month for maintaining it simple. I will use the start of each month as a timestamp.

y = furniture['Sales'].resample('MS').mean()

Visualizing The Furniture Sales Data

y.plot(figsize=(15, 6))
plt.show()Code language: Python (python)
sales forecasting

Some patterns can be drawn from the above figure, the time series is patterned seasonally like sales are low at the beginning of every year, and sales increases at the end of the year.

Now let’s visualize this data using the time series decomposition method which will allow our time series to decompose into three components:

  1. Trend
  2. Season
  3. Noise
from pylab import rcParams
rcParams['figure.figsize'] = 18, 8
decomposition = sm.tsa.seasonal_decompose(y, model='additive')
fig = decomposition.plot()
plt.show()Code language: Python (python)

The above figure shows that the sales of furniture is not stable because of the seasons.

Time Series Forecasting with ARIMA

ARIMA is one of the most used methods in time series forecasting. ARIMA stands for Autoregressive Integrated Moving Average. Now I will use the ARIMA method in the further process of time series forecasting.

p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3]))
print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))Code language: Python (python)

This step is the process of selection of parameters in our Time Series Forecasting model for furniture sales.

for param in pdq:
    for param_seasonal in seasonal_pdq:
            mod = sm.tsa.statespace.SARIMAX(y,
results = mod.fit()
print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
            continueCode language: Python (python)
time series analysis

Fitting ARIMA Model

mod = sm.tsa.statespace.SARIMAX(y,
                                order=(1, 1, 1),
                                seasonal_order=(1, 1, 0, 12),
results = mod.fit()
print(results.summary().tables[1])Code language: Python (python)
time series forecasting

Now I will run Model diagnosis; running a model diagnosis is essential in Time Series Forecasting to investigate any unusual behavior in the model.

results.plot_diagnostics(figsize=(16, 8))
plt.show()Code language: Python (python)
time series visualization

Validating Time Series Forecasts

To understand the accuracy of our time series forecasting model, I will compare predicted sales with actual sales, and I will set the forecasts to start at 2017-01-01 to the end of the dataset.

pred = results.get_prediction(start=pd.to_datetime('2017-01-01'), dynamic=False)
pred_ci = pred.conf_int()
ax = y['2014':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 7))
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_ylabel('Furniture Sales')
plt.show()Code language: Python (python)
time series with machine learning

The above figure is showing the observed values in comparison with the forecast predictions. The picture is aligned with the actual sales, really well, which is showing an upward shift in the beginning and captures the seasonality at the end of the year.

y_forecasted = pred.predicted_mean
y_truth = y['2017-01-01':]
mse = ((y_forecasted - y_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))Code language: Python (python)

The Mean Squared Error of our forecasts is 22993.58

print('The Root Mean Squared Error of our forecasts is {}'.format(round(np.sqrt(mse), 2)))Code language: Python (python)

The Root Mean Squared Error of our forecasts is 151.64

In statistics, the Mean Squared Error (MSE) of an estimator measures the average of the squares of the error that is, the common squared distinction among the anticipated values and what is estimated. The MSE is a measure of the fine of an estimator, its miles continually non-negative, and the smaller the MSE, the nearer we are to locating the road of an excellent fit.

Root Mean Square Error (RMSE) tells us that our version was capable of forecast the average daily furniture income in the test set within 151.64 of the actual income. Our furniture day by day income range from around 400 to over 1200. In my opinion, that is a pretty good version so far.

Producing and visualizing forecasts

pred_uc = results.get_forecast(steps=100)
pred_ci = pred_uc.conf_int()
ax = y.plot(label='observed', figsize=(14, 7))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
                pred_ci.iloc[:, 0],
                pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_ylabel('Furniture Sales')
plt.show()Code language: Python (python)
time series analysis

Also, read – Weather Forecasting with Machine Learning

Our Time Series Forecasting model,¬†without a doubt,¬†captured furniture¬†profits¬†seasonality. As we forecast¬†further¬†out into the future,¬†it’s very natural¬†for us to¬†become¬†very¬†much less¬†assured¬†in our values. This gets reflected¬†by¬†way¬†of the¬†self-belief¬†intervals¬†generated¬†via¬†our model, which grows¬†more significant¬†as we move¬†similarly¬†out into the future.

Follow Us:

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of dataūüďą.

Articles: 1498

Leave a Reply