Time Series Analysis and Forecasting with Python

Time series analysis

Time Series Analysis carries methods to research time-series statistics to extract statistical features from the data. Time Series Forecasting is used in training a Machine learning model to predict future values with the usage of historical importance.

Time Series Analysis is broadly speaking used in training machine learning models for the Economy, Weather forecasting, stock price prediction, and additionally in Sales forecasting.

It can be said that Time Series Analysis is widely used in facts based on non-stationary features.

Time Series Analysis and Forecasting with Python

In this article, I will use different methods for sales forecasting using the time series analysis with python. You can download the dataset that I have used in this article below.

Let’s start with this tutorial on Time Series Forecasting using Python by importing the libraries.

import warnings import itertools import numpy as np import matplotlib.pyplot as plt warnings.filterwarnings("ignore") plt.style.use('fivethirtyeight') import pandas as pd import statsmodels.api as sm import matplotlib matplotlib.rcParams['axes.labelsize'] = 14 matplotlib.rcParams['xtick.labelsize'] = 12 matplotlib.rcParams['ytick.labelsize'] = 12 matplotlib.rcParams['text.color'] = 'k'

There are different categories in the dataset, lets start from time series analysis and sales forecasting of furniture.

df = pd.read_excel("Superstore.xls") furniture = df.loc[df['Category'] == 'Furniture'] furniture['Order Date'].min(), furniture['Order Date'].max()

Timestamp(‘2014–01–06 00:00:00’), Timestamp(‘2017–12–30 00:00:00’)

Data Preprocessing

Data Preprocessing includes removing columns that we don’t need, looking for missing values, etc.

cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit'] furniture.drop(cols, axis=1, inplace=True) furniture = furniture.sort_values('Order Date') furniture.isnull().sum()
output
furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()

Indexing Time Series Data

furniture = furniture.set_index('Order Date')
furniture.index
output

The current DateTime looks a little challenging to work within the dataset, so I will use the price of each day sales on average of the month for maintaining it simple. I will use the start of each month as a timestamp.

y = furniture['Sales'].resample('MS').mean()

Visualizing The Furniture Sales Data

y.plot(figsize=(15, 6)) plt.show()
sales forecasting

Some patterns can be drawn from the above figure, the time series is patterned seasonally like sales are low at the beginning of every year, and sales increases at the end of the year.

Now let’s visualize this data using the time series decomposition method which will allow our time series to decompose into three components:

  1. Trend
  2. Season
  3. Noise
from pylab import rcParams rcParams['figure.figsize'] = 18, 8 decomposition = sm.tsa.seasonal_decompose(y, model='additive') fig = decomposition.plot() plt.show()
forecasting

The above figure shows that the sales of furniture is not stable because of the seasons.

Time Series Forecasting with ARIMA

ARIMA is one of the most used methods in time series forecasting. ARIMA stands for Autoregressive Integrated Moving Average. Now I will use the ARIMA method in the further process of time series forecasting.

p = d = q = range(0, 2) pdq = list(itertools.product(p, d, q)) seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))] print('Examples of parameter combinations for Seasonal ARIMA...') print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1])) print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[2])) print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[3])) print('SARIMAX: {} x {}'.format(pdq[2], seasonal_pdq[4]))

This step is the process of selection of parameters in our Time Series Forecasting model for furniture sales.

for param in pdq: for param_seasonal in seasonal_pdq: try: mod = sm.tsa.statespace.SARIMAX(y, order=param, seasonal_order=param_seasonal, enforce_stationarity=False, enforce_invertibility=False) results = mod.fit() print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic)) except: continue
time series analysis

Fitting ARIMA Model

mod = sm.tsa.statespace.SARIMAX(y, order=(1, 1, 1), seasonal_order=(1, 1, 0, 12), enforce_stationarity=False, enforce_invertibility=False) results = mod.fit() print(results.summary().tables[1])
time series forecasting

Now I will run Model diagnosis; running a model diagnosis is essential in Time Series Forecasting to investigate any unusual behavior in the model.

results.plot_diagnostics(figsize=(16, 8)) plt.show()
time series visualization

Validating Time Series Forecasts

To understand the accuracy of our time series forecasting model, I will compare predicted sales with actual sales, and I will set the forecasts to start at 2017-01-01 to the end of the dataset.

pred = results.get_prediction(start=pd.to_datetime('2017-01-01'), dynamic=False) pred_ci = pred.conf_int() ax = y['2014':].plot(label='observed') pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 7)) ax.fill_between(pred_ci.index, pred_ci.iloc[:, 0], pred_ci.iloc[:, 1], color='k', alpha=.2) ax.set_xlabel('Date') ax.set_ylabel('Furniture Sales') plt.legend() plt.show()
time series with machine learning

The above figure is showing the observed values in comparison with the forecast predictions. The picture is aligned with the actual sales, really well, which is showing an upward shift in the beginning and captures the seasonality at the end of the year.

y_forecasted = pred.predicted_mean y_truth = y['2017-01-01':] mse = ((y_forecasted - y_truth) ** 2).mean() print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))

The Mean Squared Error of our forecasts is 22993.58

print('The Root Mean Squared Error of our forecasts is {}'.format(round(np.sqrt(mse), 2)))

The Root Mean Squared Error of our forecasts is 151.64

In statistics, the Mean Squared Error (MSE) of an estimator measures the average of the squares of the error that is, the common squared distinction among the anticipated values and what is estimated. The MSE is a measure of the fine of an estimator, its miles continually non-negative, and the smaller the MSE, the nearer we are to locating the road of an excellent fit.

Root Mean Square Error (RMSE) tells us that our version was capable of forecast the average daily furniture income in the test set within 151.64 of the actual income. Our furniture day by day income range from around 400 to over 1200. In my opinion, that is a pretty good version so far.

Producing and visualizing forecasts

pred_uc = results.get_forecast(steps=100) pred_ci = pred_uc.conf_int() ax = y.plot(label='observed', figsize=(14, 7)) pred_uc.predicted_mean.plot(ax=ax, label='Forecast') ax.fill_between(pred_ci.index, pred_ci.iloc[:, 0], pred_ci.iloc[:, 1], color='k', alpha=.25) ax.set_xlabel('Date') ax.set_ylabel('Furniture Sales') plt.legend() plt.show()
time series analysis

Also, read – Weather Forecasting with Machine Learning

Our Time Series Forecasting model, without a doubt, captured furniture profits seasonality. As we forecast further out into the future, it’s very natural for us to become very much less assured in our values. This gets reflected by way of the self-belief intervals generated via our model, which grows more significant as we move similarly out into the future.

Follow Us:

Default image
Aman Kharwal

I am a programmer from India, and I am here to guide you with Data Science, Machine Learning, Python, and C++ for free. I hope you will learn a lot in your journey towards Coding, Machine Learning and Artificial Intelligence with me.

Leave a Reply