ARIMA model means Autoregressive Integrated Moving Average. This model provides a family of functions which are a very powerful and flexible to perform any task related to Time Series Forecasting. In Machine Learning ARIMA model is generally a class of statistical models that give outputs which are linearly dependent on their previous values in the combination of stochastic factors.

While choosing an appropriate time series forecasting model, we need to visualize the data to analyse the trends, seasonalities, and cycles. When seasonality is a very strong feature of the time series we need to consider a model such as seasonal ARIMA (SARIMA).

The ARIMA model works by using a distributed lag model in which algorithms are used to predict the future based on the lagged values. In this article, I will show you how to use an ARIMA model by using a very practical example in Machine Learning which is Anomaly Detection.

**Anomaly Detection with ARIMA Model**

Anomaly Detection means to identify unexpected events in a process. It means to detect threats to our systems that may cause harm in terms of security and leakage of important information. The importance of Anomaly Detection is not limited to security, but it is used for detection of any event that does not conform to our expectations. Here I will explain to you how we can use ARIMA model for Anomaly Detection.

I will use the data which is based on per-minute metrics of the host’s CPU utilization. Now let’s get started with this task by importing the necessary libraries:

```
import pandas as pd
!pip install pyflux
import pyflux as pf
from datetime import datetime
```

Now let’s import the data and have a quick look at the data and some of its insights. You can download the data, I am using in this task from **here**.

```
from google.colab import files
uploaded = files.upload()
data_train_a = pd.read_csv('cpu-train-a.csv', parse_dates=[0], infer_datetime_format=True)
data_test_a = pd.read_csv('cpu-test-a.csv', parse_dates=[0], infer_datetime_format=True)
data_train_a.head()
```

Now, let’s visualize this data to have a quick look at what we are working with:

```
import matplotlib.pyplot as plt
plt.figure(figsize=(20,8))
plt.plot(data_train_a['datetime'], data_train_a['cpu'], color='black')
plt.ylabel('CPU %')
plt.title('CPU Utilization')
```

**Using ARIMA Model **

Now, let’s see how we can use the ARIMA model for prediction on the data:

```
model_a = pf.ARIMA(data=data_train_a, ar=11, ma=11, integ=0, target='cpu')
x = model_a.fit("M-H")
```

**Acceptance rate of Metropolis-Hastings is 0.0 Acceptance rate of Metropolis-Hastings is 0.026 Acceptance rate of Metropolis-Hastings is 0.2346**

**Tuning complete! Now sampling.**

Acceptance rate of Metropolis-Hastings is 0.244425

Acceptance rate of Metropolis-Hastings is 0.244425

Now, let’s visualize our Model:

`model_a.plot_fit(figsize=(20,8))`

The output above shows CPU utilization over time fitted with the ARIMA model prediction. Now let’s perform a sample test to evaluate the performance of our model:

`model_a.plot_predict_is(h=60, figsize=(20,8))`

The output above shows the In-sample (training set) of our ARIMA prediction model. Now, I will run the actual prediction, by using the most recent 100 observed data points being followed bt the 60 predicted points:

`model_a.plot_predict(h=60,past_values=100,figsize=(20,8))`

Letâ€™s perform the same anomaly detection on another segment of the CPU utilization dataset captured at a different time:

```
data_train_b = pd.read_csv('cpu-train-b.csv', parse_dates=[0], infer_datetime_format=True)
data_test_b = pd.read_csv('cpu-test-b.csv', parse_dates=[0], infer_datetime_format=True)
plt.figure(figsize=(20,8))
plt.plot(data_train_b['datetime'], data_train_b['cpu'], color='black')
plt.ylabel('CPU %')
plt.title('CPU Utilization')
```

Now, let’s fit this data on the model:

```
model_b = pf.ARIMA(data=data_train_b, ar=11, ma=11, integ=0, target='cpu')
x = model_b.fit("M-H")
```

**Acceptance rate of Metropolis-Hastings is 0.0 Acceptance rate of Metropolis-Hastings is 0.016 Acceptance rate of Metropolis-Hastings is 0.1344 Acceptance rate of Metropolis-Hastings is 0.21025 Acceptance rate of Metropolis-Hastings is 0.23585 Tuning complete! Now sampling. Acceptance rate of Metropolis-Hastings is 0.34395**

`model_b.plot_predict(h=60,past_values=100,figsize=(20,8))`

We can visualize the anomaly that occurs a short time after the training period, as the observed values fall within the low-confidence bands, so it will raise an anomaly alert.

I hope you liked this article on Anomaly Detection using the ARIMA Model. Feel free to ask your valuable questions in the comments section below. You can also follow me on **Medium** to learn every topic of machine learning.

**Also Read: Artificial Intelligence Projects to Boost your Portfolio.**