# Time Series Analysis and Forecasting with Python

Time Series Analysis carries methods to research time-series statistics to extract statistical features from the data. Time Series Forecasting is used in training a Machine learning model to predict future values with the usage of historical importance.

Time Series Analysis is broadly speaking used in training machine learning models for the Economy, Weather forecasting, stock price prediction, and additionally in Sales forecasting.

It can be said that Time Series Analysis is widely used in facts based on non-stationary features.

### Time Series Analysis and Forecasting with Python

Let’s start with this tutorial on Time Series Forecasting using Python by importing the libraries.

```.wp-block-code {
border: 0;
}

.wp-block-code > div {
overflow: auto;
}

.shcb-language {
border: 0;
clip: rect(1px, 1px, 1px, 1px);
-webkit-clip-path: inset(50%);
clip-path: inset(50%);
height: 1px;
margin: -1px;
overflow: hidden;
position: absolute;
width: 1px;
word-wrap: normal;
word-break: normal;
}

.hljs {
box-sizing: border-box;
}

.hljs.shcb-code-table {
display: table;
width: 100%;
}

.hljs.shcb-code-table > .shcb-loc {
color: inherit;
display: table-row;
width: 100%;
}

.hljs.shcb-code-table .shcb-loc > span {
display: table-cell;
}

.wp-block-code code.hljs:not(.shcb-wrap-lines) {
white-space: pre;
}

.wp-block-code code.hljs.shcb-wrap-lines {
white-space: pre-wrap;
}

.hljs.shcb-line-numbers {
border-spacing: 0;
counter-reset: line;
}

.hljs.shcb-line-numbers > .shcb-loc {
counter-increment: line;
}

.hljs.shcb-line-numbers .shcb-loc > span {
}

.hljs.shcb-line-numbers .shcb-loc::before {
border-right: 1px solid #ddd;
content: counter(line);
display: table-cell;
text-align: right;
-webkit-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
white-space: nowrap;
width: 1%;
}
```import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import pandas as pd
import statsmodels.api as sm
import matplotlib
matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'```Code language: Python (python)```

There are different categories in the dataset, lets start from time series analysis and sales forecasting of furniture.

``````df = pd.read_excel("Superstore.xls")
furniture = df.loc[df['Category'] == 'Furniture']
furniture['Order Date'].min(), furniture['Order Date'].max()```Code language: Python (python)```

Timestamp(‘2014–01–06 00:00:00’), Timestamp(‘2017–12–30 00:00:00’)

### Data Preprocessing

Data Preprocessing includes removing columns that we don’t need, looking for missing values, etc.

``````cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode',
'Customer ID', 'Customer Name',
'Segment', 'Country', 'City', 'State',
'Postal Code', 'Region', 'Product ID',
'Category', 'Sub-Category', 'Product Name',
'Quantity', 'Discount', 'Profit']
furniture.drop(cols, axis=1, inplace=True)
furniture = furniture.sort_values('Order Date')
furniture.isnull().sum()```Code language: Python (python)```
``furniture = furniture.groupby('Order Date')['Sales'].sum().reset_index()`Code language: Python (python)`

### Indexing Time Series Data

```furniture = furniture.set_index('Order Date')
furniture.index```

The current DateTime looks a little challenging to work within the dataset, so I will use the price of each day sales on average of the month for maintaining it simple. I will use the start of each month as a timestamp.

`y = furniture['Sales'].resample('MS').mean()`

### Visualizing The Furniture Sales Data

``````y.plot(figsize=(15, 6))
plt.show()```Code language: Python (python)```

Some patterns can be drawn from the above figure, the time series is patterned seasonally like sales are low at the beginning of every year, and sales increases at the end of the year.

Now let’s visualize this data using the time series decomposition method which will allow our time series to decompose into three components:

1. Trend
2. Season
3. Noise
``````from pylab import rcParams
rcParams['figure.figsize'] = 18, 8
fig = decomposition.plot()
plt.show()```Code language: Python (python)```

The above figure shows that the sales of furniture is not stable because of the seasons.

### Time Series Forecasting with ARIMA

ARIMA is one of the most used methods in time series forecasting. ARIMA stands for Autoregressive Integrated Moving Average. Now I will use the ARIMA method in the further process of time series forecasting.

``````p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x, x, x, 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')
print('SARIMAX: {} x {}'.format(pdq, seasonal_pdq))
print('SARIMAX: {} x {}'.format(pdq, seasonal_pdq))
print('SARIMAX: {} x {}'.format(pdq, seasonal_pdq))
print('SARIMAX: {} x {}'.format(pdq, seasonal_pdq))```Code language: Python (python)```

This step is the process of selection of parameters in our Time Series Forecasting model for furniture sales.

``````for param in pdq:
for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(y,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
except:
continue```Code language: Python (python)```

### Fitting ARIMA Model

``````mod = sm.tsa.statespace.SARIMAX(y,
order=(1, 1, 1),
seasonal_order=(1, 1, 0, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print(results.summary().tables)```Code language: Python (python)```

Now I will run Model diagnosis; running a model diagnosis is essential in Time Series Forecasting to investigate any unusual behavior in the model.

``````results.plot_diagnostics(figsize=(16, 8))
plt.show()```Code language: Python (python)```

### Validating Time Series Forecasts

To understand the accuracy of our time series forecasting model, I will compare predicted sales with actual sales, and I will set the forecasts to start at 2017-01-01 to the end of the dataset.

``````pred = results.get_prediction(start=pd.to_datetime('2017-01-01'), dynamic=False)
pred_ci = pred.conf_int()
ax = y['2014':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 7))
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Furniture Sales')
plt.legend()
plt.show()```Code language: Python (python)```

The above figure is showing the observed values in comparison with the forecast predictions. The picture is aligned with the actual sales, really well, which is showing an upward shift in the beginning and captures the seasonality at the end of the year.

``````y_forecasted = pred.predicted_mean
y_truth = y['2017-01-01':]
mse = ((y_forecasted - y_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))```Code language: Python (python)```

The Mean Squared Error of our forecasts is 22993.58

``print('The Root Mean Squared Error of our forecasts is {}'.format(round(np.sqrt(mse), 2)))`Code language: Python (python)`

The Root Mean Squared Error of our forecasts is 151.64

In statistics, the Mean Squared Error (MSE) of an estimator measures the average of the squares of the error that is, the common squared distinction among the anticipated values and what is estimated. The MSE is a measure of the fine of an estimator, its miles continually non-negative, and the smaller the MSE, the nearer we are to locating the road of an excellent fit.

Root Mean Square Error (RMSE) tells us that our version was capable of forecast the average daily furniture income in the test set within 151.64 of the actual income. Our furniture day by day income range from around 400 to over 1200. In my opinion, that is a pretty good version so far.

### Producing and visualizing forecasts

``````pred_uc = results.get_forecast(steps=100)
pred_ci = pred_uc.conf_int()
ax = y.plot(label='observed', figsize=(14, 7))
pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
ax.set_ylabel('Furniture Sales')
plt.legend()
plt.show()```Code language: Python (python)```

Also, read – Weather Forecasting with Machine Learning

Our Time Series Forecasting model, without a doubt, captured furniture profits seasonality. As we forecast further out into the future, it’s very natural for us to become very much less assured in our values. This gets reflected by way of the self-belief intervals generated via our model, which grows more significant as we move similarly out into the future. 