# SARIMA in Machine Learning

In Machine Learning, a seasonal autoregressive integrated moving average (SARIMA) model is a different step from an ARIMA model based on the concept of seasonal trends. In this article, I will introduce you to the SARIMA model in machine learning.

## What is the SARIMA Model?

Seasonal variations of the time series can take into account periodic models, allowing more accurate predictions. Seasonal ARIMA (SARIMA) defines both a seasonal and a non-seasonal component of the ARIMA model, allowing periodic characteristics to be captured.

By choosing an appropriate forecasting model, always visualize your data to identify trends, seasons and cycles. If seasonality is a strong feature of the series, consider models with seasonal adjustments such as the SARIMA model.

## SARIMA Model in Action

In this article, I will use the number of tourist arrivals in Italy. The data is taken from European statistics: annual data on tourism industries. You can download this dataset from here. First, we import the data set for foreign tourist arrivals in Italy from 2012 to October 2019 and then convert it to a time series.

Let’s get started with the task by importing and reading the data:

```.wp-block-code {
border: 0;
}

.wp-block-code > span {
display: block;
overflow: auto;
}

.shcb-language {
border: 0;
clip: rect(1px, 1px, 1px, 1px);
-webkit-clip-path: inset(50%);
clip-path: inset(50%);
height: 1px;
margin: -1px;
overflow: hidden;
position: absolute;
width: 1px;
word-wrap: normal;
word-break: normal;
}

.hljs {
box-sizing: border-box;
}

.hljs.shcb-code-table {
display: table;
width: 100%;
}

.hljs.shcb-code-table > .shcb-loc {
color: inherit;
display: table-row;
width: 100%;
}

.hljs.shcb-code-table .shcb-loc > span {
display: table-cell;
}

.wp-block-code code.hljs:not(.shcb-wrap-lines) {
white-space: pre;
}

.wp-block-code code.hljs.shcb-wrap-lines {
white-space: pre-wrap;
}

.hljs.shcb-line-numbers {
border-spacing: 0;
counter-reset: line;
}

.hljs.shcb-line-numbers > .shcb-loc {
counter-increment: line;
}

.hljs.shcb-line-numbers .shcb-loc > span {
}

.hljs.shcb-line-numbers .shcb-loc::before {
border-right: 1px solid #ddd;
content: counter(line);
display: table-cell;
text-align: right;
-webkit-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
white-space: nowrap;
width: 1%;
}
```import pandas as pd
df['date'] = pd.to_datetime(df['date'])
df = df[df['date'] &gt; '2012-01-01']
df.set_index('date', inplace=True)```Code language: JavaScript (javascript)```

## Preliminary analysis

The preliminary analysis is a visual analysis of the time series, to understand its trend and behaviour. First, we create the time series and store it in the variable ts.

``````ts = df['value']
import matplotlib.pylab as plt
plt.plot(ts)
plt.ylabel('Total Number of Tourists Arrivals')
plt.grid()
plt.tight_layout()
plt.savefig('plots/IT_tourists_arrivals.png')
plt.show()```Code language: JavaScript (javascript)```

Now, let’s tune the parameters:

``````from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):

dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest.items():
dfoutput['Critical Value (%s)'%key] = value

critical_value = dftest['5%']
test_statistic = dftest
alpha = 1e-3
pvalue = dftest
if pvalue &lt; alpha and test_statistic &lt; critical_value:  # null hypothesis: x is non stationary
print("X is stationary")
return True
else:
print("X is not stationary")
return False```Code language: PHP (php)```

We now need to transform the time series via the diff () function as many times as the time series becomes stationary:

``````ts_diff = pd.Series(ts)
d = 0
while test_stationarity(ts_diff) is False:
ts_diff = ts_diff.diff().dropna()
d = d + 1```Code language: PHP (php)```

Now, let’s plot the correlations between the parameters:

``````from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(ts_trend, lags =12)
plt.savefig('plots/acf.png')
plt.show()```Code language: JavaScript (javascript)```

## Building the SARIMA Model:

Now, let’s build our model by using the SARIMAX method provided by the statsmodels library:

``````from statsmodels.tsa.statespace.sarimax import SARIMAX
p = 9
q = 1
model = SARIMAX(ts, order=(p,d,q))
model_fit = model.fit(disp=1,solver='powell')

fcast = model_fit.get_prediction(start=1, end=len(ts))
ts_p = fcast.predicted_mean
ts_ci = fcast.conf_int()```Code language: JavaScript (javascript)```

Now, we need to plot the results of our model:

``````plt.plot(ts_p,label='prediction')
plt.plot(ts,color='red',label='actual')
plt.fill_between(ts_ci.index[1:],
ts_ci.iloc[1:, 0],
ts_ci.iloc[1:, 1], color='k', alpha=.2)
plt.ylabel('Total Number of Tourists Arrivals')
plt.legend()
plt.tight_layout()
plt.grid()
plt.savefig('plots/IT_trend_prediction.png')
plt.show()```Code language: JavaScript (javascript)```

Also, Read – Word Embeddings in Machine Learning.

I hope you liked this article on how we can build the SARIMA Model for seasonality effects. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning. ##### Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1538

### One comment

1. #### Vicky

Hi,

Could you please provide IT_tourists_arrivals.csv file. I am unbale to retrieve it from Eurostats