ARIMA Model in Machine Learning

ARIMA model means Autoregressive Integrated Moving Average. This model provides a family of functions which are a very powerful and flexible to perform any task related to Time Series Forecasting. In Machine Learning ARIMA model is generally a class of statistical models that give outputs which are linearly dependent on their previous values in the combination of stochastic factors.

While choosing an appropriate time series forecasting model, we need to visualize the data to analyse the trends, seasonalities, and cycles. When seasonality is a very strong feature of the time series we need to consider a model such as seasonal ARIMA (SARIMA).

The ARIMA model works by using a distributed lag model in which algorithms are used to predict the future based on the lagged values. In this article, I will show you how to use an ARIMA model by using a very practical example in Machine Learning which is Anomaly Detection.

Anomaly Detection with ARIMA Model

Anomaly Detection means to identify unexpected events in a process. It means to detect threats to our systems that may cause harm in terms of security and leakage of important information. The importance of Anomaly Detection is not limited to security, but it is used for detection of any event that does not conform to our expectations. Here I will explain to you how we can use ARIMA model for Anomaly Detection.

I will use the data which is based on per-minute metrics of the host’s CPU utilization. Now let’s get started with this task by importing the necessary libraries:

import pandas as pd !pip install pyflux import pyflux as pf from datetime import datetime

Now let’s import the data and have a quick look at the data and some of its insights. You can download the data, I am using in this task from here.

from google.colab import files uploaded = files.upload() data_train_a = pd.read_csv('cpu-train-a.csv', parse_dates=[0], infer_datetime_format=True) data_test_a = pd.read_csv('cpu-test-a.csv', parse_dates=[0], infer_datetime_format=True) data_train_a.head()
image for post

Now, let’s visualize this data to have a quick look at what we are working with:

import matplotlib.pyplot as plt plt.figure(figsize=(20,8)) plt.plot(data_train_a['datetime'], data_train_a['cpu'], color='black') plt.ylabel('CPU %') plt.title('CPU Utilization')
cpu utilization

Using ARIMA Model

Now, let’s see how we can use the ARIMA model for prediction on the data:

model_a = pf.ARIMA(data=data_train_a, ar=11, ma=11, integ=0, target='cpu') x ="M-H")

Acceptance rate of Metropolis-Hastings is 0.0
Acceptance rate of Metropolis-Hastings is 0.026
Acceptance rate of Metropolis-Hastings is 0.2346

Tuning complete! Now sampling.
Acceptance rate of Metropolis-Hastings is 0.244425

Now, let’s visualize our Model:


The output above shows CPU utilization over time fitted with the ARIMA model prediction. Now let’s perform a sample test to evaluate the performance of our model:

model_a.plot_predict_is(h=60, figsize=(20,8))
Image for post

The output above shows the In-sample (training set) of our ARIMA prediction model. Now, I will run the actual prediction, by using the most recent 100 observed data points being followed bt the 60 predicted points:

ARIMA Model forcast

Let’s perform the same anomaly detection on another segment of the CPU utilization dataset captured at a different time:

data_train_b = pd.read_csv('cpu-train-b.csv', parse_dates=[0], infer_datetime_format=True) data_test_b = pd.read_csv('cpu-test-b.csv', parse_dates=[0], infer_datetime_format=True) plt.figure(figsize=(20,8)) plt.plot(data_train_b['datetime'], data_train_b['cpu'], color='black') plt.ylabel('CPU %') plt.title('CPU Utilization')

Now, let’s fit this data on the model:

model_b = pf.ARIMA(data=data_train_b, ar=11, ma=11, integ=0, target='cpu') x ="M-H")

Acceptance rate of Metropolis-Hastings is 0.0
Acceptance rate of Metropolis-Hastings is 0.016
Acceptance rate of Metropolis-Hastings is 0.1344
Acceptance rate of Metropolis-Hastings is 0.21025
Acceptance rate of Metropolis-Hastings is 0.23585
Tuning complete! Now sampling.
Acceptance rate of Metropolis-Hastings is 0.34395

Anomaly Detection with ARIMA Model

We can visualize the anomaly that occurs a short time after the training period, as the observed values fall within the low-confidence bands, so it will raise an anomaly alert.

I hope you liked this article on Anomaly Detection using the ARIMA Model. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of machine learning.

Also Read: Artificial Intelligence Projects to Boost your Portfolio.

Follow Us:

Leave a Reply