In Machine Learning, a seasonal autoregressive integrated moving average (SARIMA) model is a different step from an ARIMA model based on the concept of seasonal trends. In this article, I will introduce you to the SARIMA model in machine learning.
What is the SARIMA Model?
Seasonal variations of the time series can take into account periodic models, allowing more accurate predictions. Seasonal ARIMA (SARIMA) defines both a seasonal and a non-seasonal component of the ARIMA model, allowing periodic characteristics to be captured.
By choosing an appropriate forecasting model, always visualize your data to identify trends, seasons and cycles. If seasonality is a strong feature of the series, consider models with seasonal adjustments such as the SARIMA model.
SARIMA Model in Action
In this article, I will use the number of tourist arrivals in Italy. The data is taken from European statistics: annual data on tourism industries. You can download this dataset from here. First, we import the data set for foreign tourist arrivals in Italy from 2012 to October 2019 and then convert it to a time series.
Let’s get started with the task by importing and reading the data:
import pandas as pd df = pd.read_csv('IT_tourists_arrivals.csv') df['date'] = pd.to_datetime(df['date']) df = df[df['date'] > '2012-01-01'] df.set_index('date', inplace=True)
The preliminary analysis is a visual analysis of the time series, to understand its trend and behaviour. First, we create the time series and store it in the variable ts.
ts = df['value'] import matplotlib.pylab as plt plt.plot(ts) plt.ylabel('Total Number of Tourists Arrivals') plt.grid() plt.tight_layout() plt.savefig('plots/IT_tourists_arrivals.png') plt.show()
Now, let’s tune the parameters:
Code language: PHP (php)
from statsmodels.tsa.stattools import adfuller def test_stationarity(timeseries): dftest = adfuller(timeseries, autolag='AIC') dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used']) for key,value in dftest.items(): dfoutput['Critical Value (%s)'%key] = value critical_value = dftest['5%'] test_statistic = dftest alpha = 1e-3 pvalue = dftest if pvalue < alpha and test_statistic < critical_value: # null hypothesis: x is non stationary print("X is stationary") return True else: print("X is not stationary") return False
We now need to transform the time series via the diff () function as many times as the time series becomes stationary:
Code language: PHP (php)
ts_diff = pd.Series(ts) d = 0 while test_stationarity(ts_diff) is False: ts_diff = ts_diff.diff().dropna() d = d + 1
Now, let’s plot the correlations between the parameters:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf plot_acf(ts_trend, lags =12) plt.savefig('plots/acf.png') plt.show()
Building the SARIMA Model:
Now, let’s build our model by using the SARIMAX method provided by the statsmodels library:
from statsmodels.tsa.statespace.sarimax import SARIMAX p = 9 q = 1 model = SARIMAX(ts, order=(p,d,q)) model_fit = model.fit(disp=1,solver='powell') fcast = model_fit.get_prediction(start=1, end=len(ts)) ts_p = fcast.predicted_mean ts_ci = fcast.conf_int()
Now, we need to plot the results of our model:
plt.plot(ts_p,label='prediction') plt.plot(ts,color='red',label='actual') plt.fill_between(ts_ci.index[1:], ts_ci.iloc[1:, 0], ts_ci.iloc[1:, 1], color='k', alpha=.2) plt.ylabel('Total Number of Tourists Arrivals') plt.legend() plt.tight_layout() plt.grid() plt.savefig('plots/IT_trend_prediction.png') plt.show()
I hope you liked this article on how we can build the SARIMA Model for seasonality effects. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.