ARIMA (autoregressive integrated moving average) is a widely used statistical model for time series analysis and forecasting. It combines three key components: autoregression (AR), differentiation (I), and moving average (MA). In ARIMA models, the p, d, and q values are parameters that determine the behaviour and characteristics of the model. So, if you want to learn how to calculate p, d, and q values, this article is for you. In this article, I’ll take you through the task of calculating p, d, and q values using Python.
How to Calculate P, D, and Q Values for ARIMA?
Let’s understand the process of calculating the values of P (autoregressive order), D (differentiating order) and Q (moving average order) step by step.
The autoregressive component captures the relationship between an observation and its previous values. We calculate the autocorrelation function (ACF), which measures the correlation between an observation and its lagged versions.
Moving on to differencing, this is a technique used to make a time series stationary by removing trends or seasonality. To determine the value of D, we calculate the difference between consecutive observations in the time series. If the differenced series is still not stationary, we repeat the differencing process until we achieve stationarity.
The moving average component considers the dependence between an observation and its past error terms. To calculate the value of q, we calculate the partial autocorrelation function, which measures the correlation while controlling for intermediate lags.
Now Here’s How to Calculate P, D, and Q Values using Python
Below is how to write a Python function to calculate the P, D, and Q values:
from statsmodels.tsa.stattools import acf, pacf, adfuller import numpy as np def determine_p_d_q_values(timeseries, lags): # Calculate autocorrelation and partial autocorrelation values autocorr_values = acf(timeseries, nlags=lags) pacf_values = pacf(timeseries, nlags=lags) # Determine the value of P p_value = 0 for i in range(1, len(autocorr_values)): if abs(autocorr_values[i]) > 1.96 / len(timeseries) ** 0.5: p_value = i break # Determine the value of D d_value = 0 p_value_adf = adfuller(timeseries)[1] while p_value_adf > 0.05: timeseries = np.diff(timeseries) d_value += 1 p_value_adf = adfuller(timeseries)[1] # Determine the value of Q q_value = 0 for i in range(1, len(pacf_values)): if abs(pacf_values[i]) > 1.96 / len(timeseries) ** 0.5: q_value = i break return p_value, d_value, q_value
In the code above, we started by calculating the autocorrelation and partial autocorrelation values. Next, we determined the value of P by identifying the lag where the autocorrelation exceeds a threshold.
Next, we determined the value of D by iteratively differentiating the time series until it became stationary, assessed using the Augmented Dickey-Fuller (ADF) test.
Finally, we determined the value of Q by identifying the lag where the partial autocorrelation exceeds the threshold.
Now let’s import a dataset and calculate P, D, and Q values for the time series:
import pandas as pd data = pd.read_csv("INR-USD.csv") print(data)
Date Open High Low Close Adj Close \ 0 2003-12-01 45.709000 45.728001 45.449001 45.480000 45.480000 1 2003-12-08 45.474998 45.507999 45.352001 45.451000 45.451000 2 2003-12-15 45.450001 45.500000 45.332001 45.455002 45.455002 3 2003-12-22 45.417000 45.549000 45.296001 45.507999 45.507999 4 2003-12-29 45.439999 45.645000 45.421001 45.560001 45.560001 ... ... ... ... ... ... ... 1011 2023-04-17 81.844803 82.375504 81.844803 82.140900 82.140900 1012 2023-04-24 82.054802 82.154900 81.603996 81.745399 81.745399 1013 2023-05-01 81.744797 81.950996 81.616997 81.716103 81.716103 1014 2023-05-08 81.729797 82.148499 81.673401 81.787102 81.787102 1015 2023-05-10 82.037003 82.087502 81.884003 81.930000 81.930000 Volume 0 0.0 1 0.0 2 0.0 3 0.0 4 0.0 ... ... 1011 0.0 1012 0.0 1013 0.0 1014 0.0 1015 0.0 [1016 rows x 7 columns]
You can download the dataset we are using for this task here. Now below is how we can calculate the P, D, and Q values using the function we defined above:
lags = 20 timeseries = data['Close'] p, d, q = determine_p_d_q_values(timeseries, lags) print(f"p-value: {p}, d-value: {d}, q-value: {q}")
p-value: 1, d-value: 1, q-value: 1
So this was one way of calculating P, D, and Q values for time series forecasting. There’s another way that you can use, which follows the use of ACF and PACF plots. You can learn more about it here.
Summary
So I hope you have understood how to find P, D, and Q values for ARIMA models. In ARIMA models, the p, d, and q values are parameters that determine the behaviour and characteristics of the model. I hope you liked this article on calculating P, D, and Q values using Python. Feel free to ask valuable questions in the comments section below.