ARIMA Model in Machine Learning

ARIMA model means Autoregressive Integrated Moving Average. This model provides a family of functions which are a very powerful and flexible to perform any task related to Time Series Forecasting.

In Machine Learning ARIMA model is generally a class of statistical models that give outputs which are linearly dependent on their previous values in the combination of stochastic factors.

While choosing an appropriate time series forecasting model, we need to visualize the data to analyse the trends, seasonalities, and cycles. When seasonality is a very strong feature of the time series we need to consider a model such as seasonal ARIMA (SARIMA).

The ARIMA model works by using a distributed lag model in which algorithms are used to predict the future based on the lagged values. In this article, I will show you how to use an ARIMA model by using a very practical example in Machine Learning which is Anomaly Detection.

Anomaly Detection with ARIMA Model

Anomaly Detection means to identify unexpected events in a process. It means to detect threats to our systems that may cause harm in terms of security and leakage of important information.

The importance of Anomaly Detection is not limited to security, but it is used for detection of any event that does not conform to our expectations. Here I will explain to you how we can use ARIMA model for Anomaly Detection.

I will use the data which is based on per-minute metrics of the host’s CPU utilization. Now let’s get started with this task by importing the necessary libraries:

import pandas as pd
!pip install pyflux
import pyflux as pf
from datetime import datetimeCode language: Python (python)

Now let’s import the data and have a quick look at the data and some of its insights. You can download the data, I am using in this task from here.

from google.colab import files
uploaded = files.upload()
data_train_a = pd.read_csv('cpu-train-a.csv', parse_dates=[0], infer_datetime_format=True)
data_test_a = pd.read_csv('cpu-test-a.csv', parse_dates=[0], infer_datetime_format=True)
data_train_a.head()Code language: Python (python)

Now, let’s visualize this data to have a quick look at what we are working with:

import matplotlib.pyplot as plt 
plt.figure(figsize=(20,8))
plt.plot(data_train_a['datetime'], data_train_a['cpu'], color='black')
plt.ylabel('CPU %')
plt.title('CPU Utilization')Code language: Python (python)

Using ARIMA Model

Now, let’s see how we can use the ARIMA model for prediction on the data:

model_a = pf.ARIMA(data=data_train_a, ar=11, ma=11, integ=0, target='cpu')
x = model_a.fit("M-H")Code language: Python (python)

Acceptance rate of Metropolis-Hastings is 0.0
Acceptance rate of Metropolis-Hastings is 0.026
Acceptance rate of Metropolis-Hastings is 0.2346
Tuning complete! Now sampling.
Acceptance rate of Metropolis-Hastings is 0.244425

Now, let’s visualize our Model:

model_a.plot_fit(figsize=(20,8))Code language: Python (python)

The output above shows CPU utilization over time fitted with the ARIMA model prediction. Now let’s perform a sample test to evaluate the performance of our model:

model_a.plot_predict_is(h=60, figsize=(20,8))Code language: Python (python)

The output above shows the In-sample (training set) of our ARIMA prediction model. Now, I will run the actual prediction, by using the most recent 100 observed data points being followed by the 60 predicted points:

model_a.plot_predict(h=60,past_values=100,figsize=(20,8))Code language: Python (python)

Let’s perform the same anomaly detection on another segment of the CPU utilization dataset captured at a different time:

data_train_b = pd.read_csv('cpu-train-b.csv', parse_dates=[0], infer_datetime_format=True)
data_test_b = pd.read_csv('cpu-test-b.csv', parse_dates=[0], infer_datetime_format=True)
plt.figure(figsize=(20,8))
plt.plot(data_train_b['datetime'], data_train_b['cpu'], color='black')
plt.ylabel('CPU %')
plt.title('CPU Utilization')Code language: Python (python)

Now, let’s fit this data on the model:

model_b = pf.ARIMA(data=data_train_b, ar=11, ma=11, integ=0, target='cpu')
x = model_b.fit("M-H")Code language: Python (python)

Acceptance rate of Metropolis-Hastings is 0.0
Acceptance rate of Metropolis-Hastings is 0.016
Acceptance rate of Metropolis-Hastings is 0.1344
Acceptance rate of Metropolis-Hastings is 0.21025
Acceptance rate of Metropolis-Hastings is 0.23585
Tuning complete! Now sampling.
Acceptance rate of Metropolis-Hastings is 0.34395

model_b.plot_predict(h=60,past_values=100,figsize=(20,8))Code language: Python (python)

We can visualize the anomaly that occurs a short time after the training period, as the observed values fall within the low-confidence bands, so it will raise an anomaly alert.

I hope you liked this article on Anomaly Detection using the ARIMA Model. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of machine learning.

Also Read: Artificial Intelligence Projects to Boost your Portfolio.