How to Calculate P D and Q Values using Python

ARIMA (autoregressive integrated moving average) is a widely used statistical model for time series analysis and forecasting. It combines three key components: autoregression (AR), differentiation (I), and moving average (MA). In ARIMA models, the p, d, and q values are parameters that determine the behaviour and characteristics of the model. So, if you want to learn how to calculate p, d, and q values, this article is for you. In this article, I’ll take you through the task of calculating p, d, and q values using Python.

How to Calculate P, D, and Q Values for ARIMA?

Let’s understand the process of calculating the values of P (autoregressive order), D (differentiating order) and Q (moving average order) step by step.

The autoregressive component captures the relationship between an observation and its previous values. We calculate the autocorrelation function (ACF), which measures the correlation between an observation and its lagged versions.

Moving on to differencing, this is a technique used to make a time series stationary by removing trends or seasonality. To determine the value of D, we calculate the difference between consecutive observations in the time series. If the differenced series is still not stationary, we repeat the differencing process until we achieve stationarity.

The moving average component considers the dependence between an observation and its past error terms. To calculate the value of q, we calculate the partial autocorrelation function, which measures the correlation while controlling for intermediate lags.

Now Here’s How to Calculate P, D, and Q Values using Python

Below is how to write a Python function to calculate the P, D, and Q values:

from statsmodels.tsa.stattools import acf, pacf, adfuller
import numpy as np

def determine_p_d_q_values(timeseries, lags):
    # Calculate autocorrelation and partial autocorrelation values
    autocorr_values = acf(timeseries, nlags=lags)
    pacf_values = pacf(timeseries, nlags=lags)

    # Determine the value of P
    p_value = 0
    for i in range(1, len(autocorr_values)):
        if abs(autocorr_values[i]) > 1.96 / len(timeseries) ** 0.5:
            p_value = i
            break

    # Determine the value of D
    d_value = 0
    p_value_adf = adfuller(timeseries)[1]
    while p_value_adf > 0.05:
        timeseries = np.diff(timeseries)
        d_value += 1
        p_value_adf = adfuller(timeseries)[1]

    # Determine the value of Q
    q_value = 0
    for i in range(1, len(pacf_values)):
        if abs(pacf_values[i]) > 1.96 / len(timeseries) ** 0.5:
            q_value = i
            break

    return p_value, d_value, q_value

In the code above, we started by calculating the autocorrelation and partial autocorrelation values. Next, we determined the value of P by identifying the lag where the autocorrelation exceeds a threshold.

Next, we determined the value of D by iteratively differentiating the time series until it became stationary, assessed using the Augmented Dickey-Fuller (ADF) test.

Finally, we determined the value of Q by identifying the lag where the partial autocorrelation exceeds the threshold.

Now let’s import a dataset and calculate P, D, and Q values for the time series:

import pandas as pd
data = pd.read_csv("INR-USD.csv")

print(data)
            Date       Open       High        Low      Close  Adj Close  \
0     2003-12-01  45.709000  45.728001  45.449001  45.480000  45.480000   
1     2003-12-08  45.474998  45.507999  45.352001  45.451000  45.451000   
2     2003-12-15  45.450001  45.500000  45.332001  45.455002  45.455002   
3     2003-12-22  45.417000  45.549000  45.296001  45.507999  45.507999   
4     2003-12-29  45.439999  45.645000  45.421001  45.560001  45.560001   
...          ...        ...        ...        ...        ...        ...   
1011  2023-04-17  81.844803  82.375504  81.844803  82.140900  82.140900   
1012  2023-04-24  82.054802  82.154900  81.603996  81.745399  81.745399   
1013  2023-05-01  81.744797  81.950996  81.616997  81.716103  81.716103   
1014  2023-05-08  81.729797  82.148499  81.673401  81.787102  81.787102   
1015  2023-05-10  82.037003  82.087502  81.884003  81.930000  81.930000   

      Volume  
0        0.0  
1        0.0  
2        0.0  
3        0.0  
4        0.0  
...      ...  
1011     0.0  
1012     0.0  
1013     0.0  
1014     0.0  
1015     0.0  

[1016 rows x 7 columns]

You can download the dataset we are using for this task here. Now below is how we can calculate the P, D, and Q values using the function we defined above:

lags = 20
timeseries = data['Close']
p, d, q = determine_p_d_q_values(timeseries, lags)
print(f"p-value: {p}, d-value: {d}, q-value: {q}")
p-value: 1, d-value: 1, q-value: 1

So this was one way of calculating P, D, and Q values for time series forecasting. There’s another way that you can use, which follows the use of ACF and PACF plots. You can learn more about it here.

Summary

So I hope you have understood how to find P, D, and Q values for ARIMA models. In ARIMA models, the p, d, and q values are parameters that determine the behaviour and characteristics of the model. I hope you liked this article on calculating P, D, and Q values using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1498

Leave a Reply