Contents

Chapter 3: Vector Autoregressive Methods

Prepared by: Maria Eloisa Ventura

Previously, we have introduced the classical approaches in forecasting single/univariate time series like the ARIMA (autoregressive integrated moving-average) model and the simple linear regression model. We learned that stationarity is a condition that is necessary when using ARIMA while this need not be imposed when using the linear regression model. In this notebook, we extend the forecasting problem to a more generalized framework where we deal with multivariate time series–time series which has more than one time-dependent variable. More specifically, we introduce vector autoregressive (VAR) models and show how they can be used in forecasting mutivariate time series.

Multivariate Time Series Model

As shown in the previous chapters, one of the main advantages of using simple univariate methods (e.g., ARIMA) is the ability to forecast future values of one variable by only using past values of itself. However, we know that most if not all of the variables that we observe are actually dependent on other variables. Most of the time, the information that we gather is limited by our capacity to measure the variables of interest. For example, if we want to study weather in a particular city, we could measure temperature, humidity and precipitation over time. But if we only have a thermometer, then we’ll only be able to collect data for temperature, effectively reducing our dataset to a univariate time series.

Now, in cases where we have multiple time series (longitudinal measurements of more than one variable), we can actually use multivariate time series models to understand the relationships of the different variables over time. By utilizing the additional information available from related series, these models can often provide better forecasts than the univariate time series models.

In this section, we cover the foundational information needed to understand Vector Autoregressive Models, a class of multivariate time series models, by using a framework similar to univariate time series laid out in the previous chapters, and extending it to the multivariate case.

Definition: Univariate vs Multivariate Time Series

Time series can either be univariate or multivariate. The term univariate time series consists of single observations recorded sequentially over equal time increments. When dealing with a univariate time series model (e.g., ARIMA), we usually refer to a model that contains lag values of itself as the independent variable.

On the other hand, a multivariate time series has more than one time-dependent variable. For a multivariate process, several related time series are observed simultaneously over time. As an extension of the univariate case, the multivariate time series model involves two or more input variables, and leverages the interrelationship among the different time series variables.

import numpy as np
import pandas as pd
import statsmodels.tsa as tsa
from statsmodels.tsa.vector_ar.var_model import VAR, FEVD
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.stattools import acf, adfuller, ccf, ccovf, kpss
from statsmodels.graphics.tsaplots import plot_acf
import matplotlib.pyplot as plt
import mvts_utils as utils
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

%load_ext autoreload
%autoreload 2

Example 1: Multivariate Time Series

A. Air Quality Data from UCI

The dataset contains hourly averaged measurements obtained from an Air Quality Chemical Multisensor Device which was located on the field of a polluted area at an Italian city. The dataset can be downloaded here.

aq_df = pd.read_excel("../data/AirQualityUCI/AirQualityUCI.xlsx", parse_dates=[['Date', 'Time']])\
            .set_index('Date_Time').replace(-200, np.nan).interpolate()
aq_df.head(2)
CO(GT) PT08.S1(CO) NMHC(GT) C6H6(GT) PT08.S2(NMHC) NOx(GT) PT08.S3(NOx) NO2(GT) PT08.S4(NO2) PT08.S5(O3) T RH AH Unnamed: 15 Unnamed: 16
Date_Time
2004-03-10 00:00:00 18:00:00 2.6 1360.00 150.0 11.881723 1045.50 166.0 1056.25 113.0 1692.00 1267.50 13.6 48.875001 0.757754 NaN NaN
2004-03-10 00:00:00 19:00:00 2.0 1292.25 112.0 9.397165 954.75 103.0 1173.75 92.0 1558.75 972.25 13.3 47.700000 0.725487 NaN NaN
fig,ax = plt.subplots(5, figsize=(15,8), sharex=True)
plot_cols = ['CO(GT)', 'NO2(GT)', 'C6H6(GT)', 'T', 'RH']
aq_df[plot_cols].plot(subplots=True, legend=False, ax=ax)
for a in range(len(ax)): 
    ax[a].set_ylabel(plot_cols[a])
ax[-1].set_xlabel('')
plt.tight_layout()
plt.show()
../_images/03_VectorAutoregressiveMethods_10_0.png

B. Global Health from The World Bank

This dataset combines key health statistics from a variety of sources to provide a look at global health and population trends. It includes information on nutrition, reproductive health, education, immunization, and diseases from over 200 countries. The dataset can be downloaded here.

ind_df = pd.read_csv('../data/WorldBankHealth/WorldBankHealthPopulation_SeriesSummary.csv')\
            .loc[:,['series_code', 'indicator_name']].drop_duplicates().reindex()\
            .sort_values('indicator_name').set_index('series_code')
hn_df = pd.read_csv('../data/WorldBankHealth/WorldBankHealthPopulation_HealthNutritionPopulation.csv')\
            .pivot(index='year', columns='indicator_code', values='value')

cols = ['SH.XPD.KHEX.GD.ZS', 'SH.XPD.CHEX.GD.ZS', 'SH.XPD.GHED.GD.ZS']
health_expenditure_df = hn_df.loc[np.arange(2000, 2018), cols]\
    .rename(columns = dict(ind_df.loc[cols].indicator_name\
                           .apply(lambda x: '_'.join(x.split('(')[0].split(' ')[:-1]))))
health_expenditure_df.index = pd.date_range('2000-1-1', periods=len(health_expenditure_df), freq="A-DEC")
health_expenditure_df.head(3)
indicator_code Capital_health_expenditure Current_health_expenditure Domestic_general_government_health_expenditure
2000-12-31 0.013654 3.154818 1.400685
2001-12-31 0.012675 2.947059 1.196554
2002-12-31 0.018476 2.733301 1.012481
fig, ax = plt.subplots(3, 1, sharex=True, figsize=(15,7))
health_expenditure_df.plot(subplots=True, ax=ax, legend=False)
y_label = ['Capital health expenditure', 'Current health expenditure',
           'Domestic general government\n health expenditure']
for a in range(len(ax)): 
    ax[a].set_ylabel(f"{y_label[a]}\n (% of GDP)")
plt.tight_layout()
plt.show()
../_images/03_VectorAutoregressiveMethods_13_0.png

C. US Treasury Rates

January, 1982 – December, 2016 (Weekly) https://essentialoftimeseries.com/data/ This sample dataset contains weekly data of US Treasury rates from January 1982 to December 2016. The dataset can be downloaded here.

treas_df = pd.read_excel("../data/USTreasuryRates/us-treasury-rates-weekly.xlsx")
treas_df = treas_df.rename(columns={'Unnamed: 0': 'Date'}).set_index('Date')
treas_df.index = pd.to_datetime(treas_df.index)
treas_df.head(3)
1-month 3-month 6-month 1-year 2-year 3-year 5-year 7-year 10-year Excess CRSP Mkt Returns 10-year Treasury Returns Term spread Change in term spread 5-year Treasury Returns Unnamed: 15 Excess 10-year Treasury Returns Term Spread VXO Delta VXO
Date
1982-01-08 10.296 12.08 13.36 13.80 14.12 14.32 14.46 14.54 14.47 -1.632 NaN 2.39 NaN NaN NaN -0.286662 1.729559 20.461911 -0.003106
1982-01-15 10.296 12.72 13.89 14.39 14.67 14.73 14.79 14.84 14.76 -2.212 -2.9 2.04 -0.35 1.65 -2.556 -3.758000 4.464000 NaN NaN
1982-01-22 10.296 13.47 14.30 14.72 14.93 14.92 14.81 14.80 14.73 -0.202 0.3 1.26 -0.78 0.10 0.049 -0.558000 4.434000 NaN NaN
fig,ax = plt.subplots(1, figsize=(15, 3), sharex=True)
data_df = treas_df.iloc[:, 0:9]
data_df.plot(ax=ax)
plt.ylabel('US Treasury yield (%)')
plt.xlabel('')
plt.legend(ncol=2)
plt.tight_layout()
plt.show()
../_images/03_VectorAutoregressiveMethods_16_0.png

D. Jena Climate Data

Time series dataset recorded at the Weather Station at the Max Planck Institute for Biogeochemistry in Jena, Germany from 2009 to 2016. It contains 14 different quantities (e.g., air temperature, atmospheric pressure, humidity, wind direction, and so on) were recorded every 10 minutes. You can download the data here.

jena_data = pd.read_csv('../data/jena_climate_2009_2016.csv')
jena_data['Date Time'] = pd.to_datetime(jena_data['Date Time'])
jena_data = jena_data.set_index('Date Time')
jena_data.head(3)
fig,ax = plt.subplots(jena_data.shape[-1], figsize=(15,16), sharex=True)
jena_data.iloc[-1000:].plot(subplots=True, ax=ax)
ax[-1].set_xlabel('')
plt.tight_layout()
plt.show()
../_images/03_VectorAutoregressiveMethods_19_0.png

Foundations

Before we discuss VARs, we outline some fundamental concepts below that we will need to understand the model.

Weak Stationarity of Multivariate Time Series

As in the univariate case, one of the requirements that we need to satisfy before we can apply VAR models is stationarity–in particular, weak stationarity. Both in the univariate and multivariate case, the first two moments of the time series are time-invariant. More formally, we describe weak stationarity below.

Consider a \(N\)-dimensional time series, \(\mathbf{y}_t = \left[y_{1,t}, y_{2,t}, ..., y_{N,t}\right]^\prime\). This is said to be weakly stationary if the first two moments are finite and constant through time, that is,

  1. \(E\left[\mathbf{y}_t\right] = \boldsymbol{\mu}\)

  2. \(E\left[(\mathbf{y}_t-\boldsymbol{\mu})(\mathbf{y}_t-\boldsymbol{\mu})^\prime\right] \equiv \boldsymbol\Gamma_0 < \infty\) for all \(t\)

  3. \(E\left[(\mathbf{y}_t-\boldsymbol{\mu})(\mathbf{y}_{t-h}-\boldsymbol{\mu})^\prime\right] \equiv \boldsymbol\Gamma_h\) for all \(t\) and \(h\)

where the expectations are taken element-by-element over the joint distribution of \(\mathbf{y}_t\):

  • \(\boldsymbol{\mu}\) is the vector of means \(\boldsymbol\mu = \left[\mu_1, \mu_2, ..., \mu_N \right]\)

  • \(\boldsymbol\Gamma_0\) is the \(N\times N\) covariance matrix where the \(i\)th diagonal element is the variance of \(y_{i,t}\), and the \((i, j)\)th element is the covariance between \(y_{i,t}\) and \({y_{j,t}}\)

  • \(\boldsymbol\Gamma_h\) is the cross-covariance matrix at lag \(h\)

Obtaining Cross-Correlation Matrix from Cross-Covariance Matrix

When dealing with a multivariate time series, we can examine the predictability of one variable on another by looking at the relationship between them using the cross-covariance function (CCVF) and cross-correlation function (CCF). To do this, we begin by defining the cross-covariance between two variables, then we estimate the cross-correlation between one variable and another variable that is time-shifted. This informs us whether one time series may be related to the past lags of the other. In other words, CCF is used for identifying lags of one variable that might be useful as a predictor of the other variable.

At lag 0:

Let \(\mathbf\Gamma_0\) be the covariance matrix at lag 0, \(\mathbf D\) be a \(N\times N\) diagonal matrix containing the standard deviations of \(y_{i,t}\) for \(i=1, ..., N\). The correlation matrix of \(\mathbf{y}_t\) is defined as

\[\boldsymbol\rho_0 = \mathbf D^{-1}\mathbf\Gamma_0\mathbf D^{-1}\]

where the \((i, j)\)th element of \(\boldsymbol\rho_0\) is the correlation coefficient between \(y_{i,t}\) and \(y_{j,t}\) at time \(t\):

\[\rho_{i,j}(0) = \dfrac{Cov\left[y_{i,t}, y_{j,t} \right]}{\sigma_{i,t} \sigma_{j,t}}\]

At lag h:

Let \(\boldsymbol\Gamma_h = E\left[(\mathbf{y}_t-\boldsymbol{\mu})(\mathbf{y}_{t-h}-\boldsymbol{\mu})^\prime\right]\) be the lag-\(h\) covariance cross-covariance matrix of \(\mathbf y_{t}\). The lag-\(h\) cross-correlation matrix is defined as

\[\boldsymbol\rho_h = \mathbf D^{-1}\mathbf\Gamma_h\mathbf D^{-1}\]

The \((i,j)\)th element of \(\boldsymbol\rho_h\) is the correlation coefficient between \(y_{i,t}\) and \(y_{j,t-h}\):

\[\rho_{i,j}(h) = \dfrac{Cov\left[y_{i,t}, y_{j,t-h} \right]}{\sigma_{i,t} \sigma_{j,t}}\]

What do we get from this?

Correlation Coefficient

Interpretation

\(\rho_{i,j}(0)\neq0\)

\(y_{i,t}\) and \(y_{j,t}\) are contemporaneously linearly correlated

\(\rho_{i,j}(h)=\rho_{j,i}(h)=0\) for all \(h\geq0\)

\(y_{i,t}\) and \(y_{j,t}\) share no linear relationship

\(\rho_{i,j}(h)=0\) and \(\rho_{j,i}(h)=0\) for all \(h>0\)

\(y_{i,t}\) and \(y_{j,t}\) are said to be linearly uncoupled

\(\rho_{i,j}(h)=0\) for all \(h>0\), but \(\rho_{j,i}(q)\neq0\) for at least some \(q>0\)

There is a unidirectional (linear) relationship between \(y_{i,t}\) and \(y_{j,t}\), where \(y_{i,t}\) does not depend on \(y_{j,t}\), but \(y_{j,t}\) depends on (some) lagged values of \(y_{i,t}\)

\(\rho_{i,j}(h)\neq0\) for at least some \(h>0\) and \(\rho_{j,i}(q)\neq0\) for at least some \(q>0\)

There is a bi-directional (feedback) linear relationship between \(y_{i,t}\) and \(y_{j,t}\)

Vector Autoregressive (VAR) Models

The vector autoregressive (VAR) model is one of the most successful models for analysis of multivariate time series. It has been demonstrated to be successful in describing relationships and forecasting economic and financial time series, and providing more accurate forecasts than the univariate time series models and theory-based simultaneous equations models.

The structure of the VAR model is just the generalization of the univariate AR model. It is a system regression model that treats all the variables as endogenous, and allows each of the variables to depend on \(p\) lagged values of itself and of all the other variables in the system.

A VAR model of order \(p\) can be represented as

\[\mathbf y_t=\mathbf a_0 + \mathbf A_1 \mathbf y_{t-1} + \mathbf A_2 \mathbf y_{t-2} + ... + \mathbf A_p \mathbf y_{t-p} + \mathbf u_t = \mathbf a_0 + \sum_{j=1}^{p} \mathbf A_j \mathbf y_{t-j} + \mathbf u_t\]

where \(\mathbf y_t\) is a \(N\times 1\) vector containing \(N\) endogenous variables, \(\mathbf a_0\) is a \(N\times 1\) vector of constants, \(\mathbf A_1, \mathbf A_2, ..., \mathbf A_p\) are the \(p\) \(N\times N\) matrices of autoregressive coefficients, and \(\mathbf u_t\) is a \(N\times 1\) vector of white noise disturbances.

VAR(1) Model

To illustrate, let’s consider the simplest VAR model where we have \(p=1\).

Structural and Reduced Forms of the VAR model

Consider the following bivariate system

\[ y_{1,t} = b_{1,0} - b_{1,2}y_{2,t} + \varphi_{1,1}y_{1,t-1} + \varphi_{1,2}y_{2,t-1} + \varepsilon_{1,t}\]
\[ y_{2,t} = b_{2,0} - b_{2,1}y_{2,t} + \varphi_{2,1}y_{1,t-1} + \varphi_{2,2}y_{2,t-1} + \varepsilon_{2,t}\]

where both \(y_{1,t}\) and \(y_{2,t}\) are assumed to be stationary, and \(\varepsilon_{1,t}\) and \(\varepsilon_{2,t}\) are the uncorrelated error terms with standard deviation \(\sigma_{1,t}\) and \(\sigma_{2,t}\), respectively.

In matrix notation: ../_images/bivariate1.png

Let ../_images/bivariate2.png

then

\[\mathbf B \mathbf y_t = \mathbf Q_0 + \mathbf Q_1 \mathbf y_{t-1} + \boldsymbol\varepsilon_t\]
Structural VAR (VAR in primitive form)
  • Described by equation above

  • Captures contemporaneous feedback effects (\(b_{1,2}, b_{2,1}\))

  • Not very practical to use

  • Contemporaneous terms cannot be used in forecasting

  • Needs further manipulation to make it more useful (e.g. multiplying the matrix equation by \(\mathbf B^{-1}\))

Multiplying the matrix equation by \(\mathbf B^{-1}\), we get

\[\mathbf y_t = \mathbf a_0 + \mathbf A_1 \mathbf y_{t-1} + \mathbf u_t\]

or

\[\mathbf y_t = \mathbf a_0 + \mathbf A_1 L\mathbf y_t + \mathbf u_t\]

where \(\mathbf a_0 = \mathbf B^{-1}\mathbf Q_0\), \(\mathbf A_1 = \mathbf B^{-1}\mathbf Q_1\), \(L\) is the lag/backshift operator, and \(\mathbf u_t = \mathbf B^{-1}\boldsymbol\varepsilon_t\), equivalently,

\[ y_{1,t} = a_{1,0} + a_{1,1}y_{1,t-1} + a_{1,2}y_{2,t-1} + u_{1,t}\]
\[ y_{2,t} = a_{2,0} + a_{2,1}y_{1,t-1} + a_{2,2}y_{2,t-1} + u_{2,t}\]
Reduced-form VAR (VAR in standard form)
  • Described by equation above

  • Only dependent on lagged endogenous variables (no contemporaneous feedback)

  • Can be estimated using ordinary least squares (OLS)

VMA infinite representation and Stationarity

Consider the reduced form, standard VAR(1) model

\[\mathbf y_t = \mathbf a_0 + \mathbf A_1 \mathbf y_{t-1} + \mathbf u_t\]

Assuming that the process is weakly stationary and taking the expectation of \(\mathbf y_t\), we have

\[E\left[\mathbf y_t\right] = \mathbf a_0 + \mathbf A_1 E\left[\mathbf y_{t-1}\right]\]

where \(E\left[\mathbf u_t\right]=0.\) If we let \(\tilde{\mathbf y}_{t}\equiv \mathbf y_t - \boldsymbol \mu\) be the mean-corrected time-series, we can write the model as

\[\tilde{\mathbf y}_{t} = \mathbf A_1 \tilde{\mathbf y}_{t-1} + \mathbf u_t\]

Substituting \(\tilde{\mathbf y}_{t-1} = \mathbf A_1 \tilde{\mathbf y}_{t-2} + \mathbf u_{t-1}\),

\[\tilde{\mathbf y}_{t} = \mathbf A_1 \left( \mathbf A_1 \tilde{\mathbf y}_{t-2} + \mathbf u_{t-1}\right)+ \mathbf u_t\]

If we keep iterating, we get

\[\mathbf y_{t} = \boldsymbol\mu + \sum_{i=1}^{\infty} \mathbf A_1^i \mathbf u_{t-i} + \mathbf u_t\]

Letting \(\boldsymbol\Theta_i\equiv A_1^i\), we get the VMA infinite representation

\[\mathbf y_{t} = \boldsymbol\mu + \sum_{i=1}^{\infty} \boldsymbol\Theta_i \mathbf u_{t-i} + \mathbf u_t\]

Stationarity of the VAR(1) model

  • All the N eigenvalues of the matrix \(A_1\) must be less than 1 in modulus, to avoid that the coefficient matrix \(A_1^j\) from either exploding or converging to a nonzero matrix as \(j\) goes to infinity.

  • Provided that the covariance matrix of \(u_t\) exists, the requirement that all the eigenvalues of \(A_1\) are less than one in modulus is a necessary and sufficient condition for \(y_t\) to be stable, and thus, stationary.

  • All roots of \(det\left(\mathbf I_N - \mathbf A_1 z\right)=0\) must lie outside the unit circle.

VAR(p) Model

Consider the VAR(p) model described by

\[\mathbf y_t=\mathbf a_0 + \mathbf A_1 \mathbf y_{t-1} + \mathbf A_2 \mathbf y_{t-2} + ... + \mathbf A_p \mathbf y_{t-p} + \mathbf u_t = \mathbf a_0 + \sum_{j=1}^{p} \mathbf A_j \mathbf y_{t-j} + \mathbf u_t\]

Using the lag operator \(L\), we get

\[\tilde{\mathbf A} (L) \mathbf y_t = \mathbf a_0 + \mathbf u_t\]

where \(\tilde{\mathbf A} (L) = (\mathbf I_N - A_1 L - ... - A_p L^p)\). Assuming that \(\mathbf y_t\) is weakly stationary, we obtain that

\[\boldsymbol\mu = E\left[\mathbf y_t\right] = (\mathbf I_N - A_1 L - ... - A_p L^p)^{-1} \mathbf a_0\]

Defining \(\tilde{\mathbf y}_t=\mathbf y_t -\boldsymbol\mu\), we have

\[\tilde{\mathbf y}_t = \mathbf A_1 \tilde{\mathbf y}_{t-1} + \mathbf A_2 \tilde{\mathbf y}_{t-2} + ... + \mathbf A_p \tilde{\mathbf y}_{t-p} + \mathbf u_t\]

Properties

  • \(Cov[\mathbf y_t, \mathbf u_t] = \Sigma_u\), the covariance matrix of \(\mathbf u_t\)

  • \(Cov[\mathbf y_{t-h}, \mathbf u_t] = \mathbf 0\) for any \(h>0\)

  • \(\boldsymbol\Gamma_h = \mathbf A_1 \boldsymbol\Gamma_{h-1} +...+\mathbf A_p \boldsymbol\Gamma_{h-p}\) for \(h>0\)

  • \(\boldsymbol\rho_h = \boldsymbol \Psi_1 \boldsymbol\Gamma_{h-1} +...+\boldsymbol \Psi_p \boldsymbol\Gamma_{h-p}\) for \(h>0\), where \(\boldsymbol \Psi_i = \mathbf D^{-1/2}\mathbf A_i D^{-1/2}\)

Stationarity of VAR(p) model

  • All roots of \(det\left(\mathbf I_N - \mathbf A_1 z - ...- \mathbf A_p z^p\right)=0\) must lie outside the unit circle.

Specification of a VAR model: Choosing the order p

Fitting a VAR model involves the selection of a single parameter: the model order or lag length \(p\). The most common approach in selecting the best model is choosing the \(p\) that minimizes one or more information criteria evaluated over a range of model orders. The criterion consists of two terms: the first one characterizes the entropy rate or prediction error, and second one characterizes the number of freely estimated parameters in the model. Minimizing both terms will allow us to identify a model that accurately models the data while preventing overfitting (due to too many parameters).

Some of the commonly used information criteria include: Akaike Information Criterion (AIC), Schwarz’s Bayesian Information Criterion (BIC), Hannan-Quinn Criterion (HQ), and Akaike’s Final Prediction Error Criterion (FPE). The equation for each are shown below.

Akaike’s information criterion

\[(M)\operatorname{AIC} = \operatorname{ln}|\tilde{\boldsymbol\Sigma}_u | + 2\dfrac{k}{T}\]

Schwarz’s Bayesian information criterion

\[(M)\operatorname{BIC} = \operatorname{ln}|\tilde{\boldsymbol\Sigma}_u | + \dfrac{k}{T}\operatorname{ln}(T)\]

Hannan-Quinn’s information criterion

\[(M)\operatorname{HQIC} = \operatorname{ln}|\tilde{\boldsymbol\Sigma}_u | + \dfrac{k}{T}\operatorname{ln}(\operatorname{ln}(T))\]

Final prediction error

\[\operatorname{FPE}(p) = \left(\dfrac{T+Np+1}{T-Np+1} \right)^N |\tilde{\boldsymbol\Sigma}_u | \]
\[\operatorname{ln}(\operatorname{FPE}(p)) = \operatorname{ln}|\tilde{\boldsymbol\Sigma}_u | + N \operatorname{ln}\left(\dfrac{T+Np+1}{T-Np+1} \right) \]

where \(M\) stands for multivariate, \(\tilde{\boldsymbol\Sigma}_u\) is the estimated covariance matrix of the residuals, \(T\) is the number of observations in the sample, and \(k\) is the total number of equations in the VAR(\(p\)) (i.e. \(N^2p + N\) where \(N\) is the number of equations and \(p\) is the number of lags).

Among the metrics above, AIC and BIC are the most widely used criteria, but BIC penalizes higher orders more. For moderate and large sample sizes, AIC and FPE are essentially equivalent, but FPE may outperform AIC for small sample sizes. HQ also penalizes higher order models more heavily than AIC, but less than BIC. However, under small sample conditions, AIC/FPE may outperform BIC and/or HQ in selecting true model order.

There are cases when AIC and FPE show no clear minimum over a range or model orders or lag lengths. In this case, there may be a clear “elbow” when we plot the values over model order. This may indicate a suitable model order.

Building a VAR model

In this section, we show how we can use the VAR model to forecast the air quality data. The following steps are shown below:

  1. Check for stationarity.

  2. Split data into train and test sets.

  3. Select the VAR order p that gives.

  4. Fit VAR model of order p on the train set.

  5. Generate forecast.

  6. Evaluate model performance.

Example 2: Forecasting Air Quality Data (CO, NO2 and RH)

For illustration, we consider the carbon monoxide, nitrous dioxide and relative humidity time series from the Air Quality Dataset from 1 October 2014.

cols = ['CO(GT)', 'NO2(GT)', 'RH']
data_df = aq_df.loc[aq_df.index>'2004-10-01',cols]
fig,ax = plt.subplots(3, figsize=(15,5), sharex=True)
data_df.plot(ax=ax, subplots=True)
plt.xlabel('')
plt.tight_layout()
plt.show()
../_images/03_VectorAutoregressiveMethods_47_0.png

Quick inspection before we proceed with modeling…

To find out whether the multivariate approach is better than treating the signals separately as univariate time series, we examine the relationship between the variables using CCF. The sample below shows the CCF for the last 100 data points of the Air quality data for CO, NO2 and RH.

CO and NO2

sample_df = data_df.iloc[-100:]
ccf_y1_y2 = ccf(sample_df['CO(GT)'], sample_df['NO2(GT)'], unbiased=False)
ccf_y2_y1 = ccf(sample_df['NO2(GT)'], sample_df['CO(GT)'], unbiased=False)

fig, ax = plt.subplots(2, figsize=(15, 4), sharex=True, sharey=True)
d=1
ax[0].stem(np.arange(len(sample_df))[::d], ccf_y1_y2[::d], linefmt='C1-', markerfmt='C1o')
ax[1].stem(np.arange(len(sample_df))[::d], ccf_y2_y1[::d], linefmt='C1-', markerfmt='C1o')
ax[-1].set_ylim(-1, 1)
ax[0].set_xlim(0, 100)
ax[-1].set_xlabel('lag $h$', fontsize=14)
ax[0].set_ylabel(r'$\rho_{CO,NO2} (h)$', fontsize=14)
ax[1].set_ylabel(r'$\rho_{NO2,CO} (h)$', fontsize=14)
plt.tight_layout()
plt.show()
../_images/03_VectorAutoregressiveMethods_50_0.png

CO and RH

ccf_y1_y2 = ccf(sample_df['CO(GT)'], sample_df['RH'], unbiased=False)
ccf_y2_y1 = ccf(sample_df['RH'], sample_df['CO(GT)'], unbiased=False)

fig, ax = plt.subplots(2, figsize=(15, 4), sharex=True, sharey=True)
d=1
ax[0].stem(np.arange(len(sample_df))[::d], ccf_y1_y2[::d], linefmt='C1-', markerfmt='C1o')
ax[1].stem(np.arange(len(sample_df))[::d], ccf_y2_y1[::d], linefmt='C1-', markerfmt='C1o')
ax[-1].set_ylim(-1, 1)
ax[0].set_xlim(0, 100)
ax[-1].set_xlabel('lag $h$', fontsize=14)
ax[0].set_ylabel(r'$\rho_{CO,RH} (h)$', fontsize=14)
ax[1].set_ylabel(r'$\rho_{RH,CO} (h)$', fontsize=14)
plt.tight_layout()
plt.show()
../_images/03_VectorAutoregressiveMethods_52_0.png

RH and NO2

ccf_y1_y2 = ccf(sample_df['RH'], sample_df['NO2(GT)'], unbiased=False)
ccf_y2_y1 = ccf(sample_df['NO2(GT)'], sample_df['RH'], unbiased=False)

fig, ax = plt.subplots(2, figsize=(15, 4), sharex=True, sharey=True)
d=1
ax[0].stem(np.arange(len(sample_df))[::d], ccf_y1_y2[::d], linefmt='C1-', markerfmt='C1o')
ax[1].stem(np.arange(len(sample_df))[::d], ccf_y2_y1[::d], linefmt='C1-', markerfmt='C1o')
ax[-1].set_ylim(-1, 1)
ax[0].set_xlim(0, 100)
ax[-1].set_xlabel('lag $h$', fontsize=14)
ax[0].set_ylabel(r'$\rho_{RH,NO2} (h)$', fontsize=14)
ax[1].set_ylabel(r'$\rho_{NO2,RH} (h)$', fontsize=14)
plt.tight_layout()
plt.show()
../_images/03_VectorAutoregressiveMethods_54_0.png

Observation/s:

As shown in the plot above, we can see that there’s a relationship between:

  • CO and some lagged values of RH and NO2

  • NO2 and some lagged values of RH and CO

  • RH and some lagged values of CO and NO2

This shows that we can benefit from the multivariate approach, so we proceed with building the VAR model.

1. Check stationarity

To check for stationarity, we use the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test and the Augmented Dickey-Fuller (ADF) test. For the data to be suitable for VAR modelling, we need each of the variables in the multivariate time series to be stationary. In both tests, we need the test statistic to be less than the critical values to say that a time series (a variable) to be stationary.

Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test

Recall: Null hypothesis is that an observable time series is stationary around a deterministic trend (i.e. trend-stationary) against the alternative of a unit root.

test_stat, p_val = [], []
cv_1pct, cv_2p5pct, cv_5pct, cv_10pct = [], [], [], []
for c in data_df.columns: 
    kpss_res = kpss(data_df[c].dropna(), regression='ct')
    test_stat.append(kpss_res[0])
    p_val.append(kpss_res[1])
    cv_1pct.append(kpss_res[3]['1%'])
    cv_2p5pct.append(kpss_res[3]['1%'])
    cv_5pct.append(kpss_res[3]['5%'])
    cv_10pct.append(kpss_res[3]['10%'])
kpss_res_df = pd.DataFrame({'Test statistic': test_stat, 
                           'p-value': p_val, 
                           'Critical value - 1%': cv_1pct,
                           'Critical value - 2.5%': cv_2p5pct,
                           'Critical value - 5%': cv_5pct,
                           'Critical value - 10%': cv_10pct}, 
                         index=data_df.columns).T
kpss_res_df.round(4)
CO(GT) NO2(GT) RH
Test statistic 0.0702 0.3239 0.1149
p-value 0.1000 0.0100 0.1000
Critical value - 1% 0.2160 0.2160 0.2160
Critical value - 2.5% 0.2160 0.2160 0.2160
Critical value - 5% 0.1460 0.1460 0.1460
Critical value - 10% 0.1190 0.1190 0.1190

Observation/s:

From the KPSS test, CO and RH are stationary.

Augmented Dickey-Fuller (ADF) test

Recall: Null hypothesis is that a unit root is present in a time series sample against the alternative that the time series is stationary.

test_stat, p_val = [], []
cv_1pct, cv_5pct, cv_10pct = [], [], []
for c in data_df.columns: 
    adf_res = adfuller(data_df[c].dropna())
    test_stat.append(adf_res[0])
    p_val.append(adf_res[1])
    cv_1pct.append(adf_res[4]['1%'])
    cv_5pct.append(adf_res[4]['5%'])
    cv_10pct.append(adf_res[4]['10%'])
adf_res_df = pd.DataFrame({'Test statistic': test_stat, 
                           'p-value': p_val, 
                           'Critical value - 1%': cv_1pct,
                           'Critical value - 5%': cv_5pct,
                           'Critical value - 10%': cv_10pct}, 
                         index=data_df.columns).T
adf_res_df.round(4)
CO(GT) NO2(GT) RH
Test statistic -7.0195 -6.7695 -6.8484
p-value 0.0000 0.0000 0.0000
Critical value - 1% -3.4318 -3.4318 -3.4318
Critical value - 5% -2.8622 -2.8622 -2.8622
Critical value - 10% -2.5671 -2.5671 -2.5671

Observation/s:

From the ADF test, CO, NO2 and RH are stationary.

2. Split data into train and test sets

We use the dataset from 01 October 2014 to predict the last 24 points (24 hrs/1 day) in the dataset.

forecast_length = 24 
train_df, test_df = data_df.iloc[:-forecast_length], data_df.iloc[-forecast_length:]
test_df = test_df.filter(test_df.columns[~test_df.columns.str.contains('-d')])
train_df.reset_index().to_csv('../data/AirQualityUCI/train_data.csv')
test_df.reset_index().to_csv('../data/AirQualityUCI/test_data.csv')
fig,ax = plt.subplots(3, figsize=(15, 5), sharex=True)
train_df.plot(ax=ax, subplots=True)
plt.xlabel('')
plt.tight_layout()
plt.show()
../_images/03_VectorAutoregressiveMethods_68_0.png

3. Select order p

We compute the different multivariate information criteria (AIC, BIC, HQIC), and FPE. We pick the set of order parameters that correspond to the lowest values.

aic, bic, fpe, hqic = [], [], [], []
model = VAR(train_df) 
p = np.arange(1,60)
for i in p:
    result = model.fit(i)
    aic.append(result.aic)
    bic.append(result.bic)
    fpe.append(result.fpe)
    hqic.append(result.hqic)
lags_metrics_df = pd.DataFrame({'AIC': aic, 
                                'BIC': bic, 
                                'HQIC': hqic,
                                'FPE': fpe}, 
                               index=p)    
fig, ax = plt.subplots(1, 4, figsize=(15, 3), sharex=True)
lags_metrics_df.plot(subplots=True, ax=ax, marker='o')
plt.tight_layout()
../_images/03_VectorAutoregressiveMethods_71_0.png

Observation/s:

We find BIC and HQIC to be lowest at \(p=26\), and we also observe an elbow in the plots for AIC, and FPE, so we choose the number of lags to be 26.

3. Fit VAR model with chosen order

%%time
var_model = model.fit(26)
var_model.summary()
CPU times: user 401 ms, sys: 39.9 ms, total: 441 ms
Wall time: 202 ms
  Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Sun, 21, Feb, 2021
Time:                     00:47:42
--------------------------------------------------------------------
No. of Equations:         3.00000    BIC:                    7.42717
Nobs:                     4404.00    HQIC:                   7.20458
Log likelihood:          -34107.4    FPE:                    1191.88
AIC:                      7.08328    Det(Omega_mle):         1129.97
--------------------------------------------------------------------
Results for equation CO(GT)
==============================================================================
                 coefficient       std. error           t-stat            prob
------------------------------------------------------------------------------
const               0.241332         0.069966            3.449           0.001
L1.CO(GT)           0.961418         0.019229           49.999           0.000
L1.NO2(GT)          0.001992         0.000713            2.794           0.005
L1.RH               0.009321         0.002735            3.409           0.001
L2.CO(GT)          -0.279491         0.026156          -10.686           0.000
L2.NO2(GT)          0.000661         0.000970            0.682           0.495
L2.RH              -0.017300         0.004278           -4.044           0.000
L3.CO(GT)           0.098032         0.026251            3.734           0.000
L3.NO2(GT)         -0.000165         0.000970           -0.170           0.865
L3.RH               0.012764         0.004352            2.933           0.003
L4.CO(GT)          -0.001012         0.026285           -0.038           0.969
L4.NO2(GT)         -0.001148         0.000966           -1.189           0.234
L4.RH              -0.006423         0.004354           -1.475           0.140
L5.CO(GT)           0.054358         0.026291            2.068           0.039
L5.NO2(GT)         -0.001218         0.000966           -1.261           0.207
L5.RH               0.000664         0.004353            0.153           0.879
L6.CO(GT)          -0.017663         0.026308           -0.671           0.502
L6.NO2(GT)          0.000372         0.000966            0.385           0.700
L6.RH               0.000884         0.004351            0.203           0.839
L7.CO(GT)           0.004448         0.026302            0.169           0.866
L7.NO2(GT)          0.000921         0.000967            0.953           0.340
L7.RH              -0.000964         0.004353           -0.221           0.825
L8.CO(GT)          -0.001457         0.026288           -0.055           0.956
L8.NO2(GT)          0.000736         0.000968            0.760           0.447
L8.RH              -0.002357         0.004356           -0.541           0.588
L9.CO(GT)           0.018534         0.026296            0.705           0.481
L9.NO2(GT)         -0.001551         0.000969           -1.601           0.109
L9.RH               0.004720         0.004361            1.082           0.279
L10.CO(GT)         -0.003435         0.026289           -0.131           0.896
L10.NO2(GT)         0.001457         0.000969            1.503           0.133
L10.RH              0.003194         0.004366            0.732           0.464
L11.CO(GT)         -0.021526         0.026286           -0.819           0.413
L11.NO2(GT)        -0.001094         0.000969           -1.128           0.259
L11.RH             -0.001129         0.004367           -0.259           0.796
L12.CO(GT)          0.066587         0.026287            2.533           0.011
L12.NO2(GT)        -0.000285         0.000971           -0.293           0.769
L12.RH             -0.002315         0.004369           -0.530           0.596
L13.CO(GT)         -0.076568         0.026287           -2.913           0.004
L13.NO2(GT)         0.000551         0.000971            0.567           0.571
L13.RH              0.004622         0.004370            1.058           0.290
L14.CO(GT)          0.100874         0.026293            3.837           0.000
L14.NO2(GT)        -0.003074         0.000971           -3.165           0.002
L14.RH              0.001720         0.004371            0.394           0.694
L15.CO(GT)         -0.039783         0.026318           -1.512           0.131
L15.NO2(GT)         0.000926         0.000971            0.953           0.340
L15.RH             -0.002928         0.004370           -0.670           0.503
L16.CO(GT)          0.009289         0.026276            0.354           0.724
L16.NO2(GT)         0.000818         0.000969            0.845           0.398
L16.RH             -0.001466         0.004369           -0.336           0.737
L17.CO(GT)         -0.030972         0.026264           -1.179           0.238
L17.NO2(GT)         0.000273         0.000969            0.281           0.778
L17.RH              0.000574         0.004369            0.131           0.895
L18.CO(GT)          0.021964         0.026242            0.837           0.403
L18.NO2(GT)        -0.001183         0.000968           -1.222           0.222
L18.RH              0.003188         0.004366            0.730           0.465
L19.CO(GT)          0.048666         0.026200            1.858           0.063
L19.NO2(GT)        -0.000539         0.000966           -0.558           0.577
L19.RH             -0.010088         0.004362           -2.313           0.021
L20.CO(GT)         -0.065842         0.026213           -2.512           0.012
L20.NO2(GT)         0.000847         0.000965            0.878           0.380
L20.RH              0.008736         0.004363            2.002           0.045
L21.CO(GT)         -0.032047         0.026233           -1.222           0.222
L21.NO2(GT)         0.001082         0.000964            1.122           0.262
L21.RH             -0.004805         0.004360           -1.102           0.270
L22.CO(GT)          0.050969         0.026220            1.944           0.052
L22.NO2(GT)        -0.000471         0.000964           -0.489           0.625
L22.RH              0.000472         0.004360            0.108           0.914
L23.CO(GT)          0.071433         0.026226            2.724           0.006
L23.NO2(GT)         0.001170         0.000964            1.214           0.225
L23.RH             -0.003465         0.004361           -0.794           0.427
L24.CO(GT)          0.198774         0.026160            7.598           0.000
L24.NO2(GT)        -0.000280         0.000968           -0.289           0.772
L24.RH              0.001856         0.004361            0.425           0.670
L25.CO(GT)         -0.156857         0.026043           -6.023           0.000
L25.NO2(GT)         0.000666         0.000967            0.689           0.491
L25.RH              0.000979         0.004290            0.228           0.820
L26.CO(GT)         -0.047697         0.019186           -2.486           0.013
L26.NO2(GT)        -0.001958         0.000710           -2.757           0.006
L26.RH             -0.000747         0.002746           -0.272           0.786
==============================================================================

Results for equation NO2(GT)
==============================================================================
                 coefficient       std. error           t-stat            prob
------------------------------------------------------------------------------
const              12.417213         1.871552            6.635           0.000
L1.CO(GT)           1.600229         0.514359            3.111           0.002
L1.NO2(GT)          0.953169         0.019069           49.986           0.000
L1.RH               0.239537         0.073150            3.275           0.001
L2.CO(GT)          -2.404892         0.699653           -3.437           0.001
L2.NO2(GT)         -0.096250         0.025935           -3.711           0.000
L2.RH              -0.332820         0.114442           -2.908           0.004
L3.CO(GT)           0.154091         0.702192            0.219           0.826
L3.NO2(GT)          0.018241         0.025946            0.703           0.482
L3.RH               0.180558         0.116422            1.551           0.121
L4.CO(GT)           1.160555         0.703122            1.651           0.099
L4.NO2(GT)         -0.065148         0.025831           -2.522           0.012
L4.RH              -0.149770         0.116468           -1.286           0.198
L5.CO(GT)          -0.209355         0.703264           -0.298           0.766
L5.NO2(GT)          0.007011         0.025834            0.271           0.786
L5.RH               0.023622         0.116443            0.203           0.839
L6.CO(GT)           0.742858         0.703728            1.056           0.291
L6.NO2(GT)         -0.029688         0.025847           -1.149           0.251
L6.RH               0.014889         0.116384            0.128           0.898
L7.CO(GT)          -0.535398         0.703562           -0.761           0.447
L7.NO2(GT)          0.089765         0.025857            3.472           0.001
L7.RH              -0.106262         0.116432           -0.913           0.361
L8.CO(GT)          -1.025553         0.703193           -1.458           0.145
L8.NO2(GT)         -0.034196         0.025895           -1.321           0.187
L8.RH               0.127190         0.116517            1.092           0.275
L9.CO(GT)           0.313129         0.703395            0.445           0.656
L9.NO2(GT)          0.005236         0.025910            0.202           0.840
L9.RH              -0.136264         0.116667           -1.168           0.243
L10.CO(GT)         -0.507195         0.703213           -0.721           0.471
L10.NO2(GT)         0.000967         0.025922            0.037           0.970
L10.RH              0.209375         0.116786            1.793           0.073
L11.CO(GT)         -1.319501         0.703135           -1.877           0.061
L11.NO2(GT)         0.069145         0.025930            2.667           0.008
L11.RH             -0.053156         0.116828           -0.455           0.649
L12.CO(GT)          2.602906         0.703181            3.702           0.000
L12.NO2(GT)        -0.076889         0.025968           -2.961           0.003
L12.RH             -0.044469         0.116867           -0.381           0.704
L13.CO(GT)         -2.098667         0.703156           -2.985           0.003
L13.NO2(GT)         0.068250         0.025986            2.626           0.009
L13.RH              0.164246         0.116893            1.405           0.160
L14.CO(GT)          1.141015         0.703317            1.622           0.105
L14.NO2(GT)        -0.077485         0.025985           -2.982           0.003
L14.RH              0.121439         0.116918            1.039           0.299
L15.CO(GT)          0.157231         0.704010            0.223           0.823
L15.NO2(GT)         0.016822         0.025982            0.647           0.517
L15.RH             -0.173595         0.116888           -1.485           0.138
L16.CO(GT)          0.878201         0.702861            1.249           0.211
L16.NO2(GT)         0.003234         0.025923            0.125           0.901
L16.RH             -0.029109         0.116858           -0.249           0.803
L17.CO(GT)         -1.418159         0.702559           -2.019           0.044
L17.NO2(GT)         0.016251         0.025913            0.627           0.531
L17.RH             -0.015862         0.116863           -0.136           0.892
L18.CO(GT)          0.655452         0.701962            0.934           0.350
L18.NO2(GT)        -0.021180         0.025901           -0.818           0.414
L18.RH              0.128568         0.116801            1.101           0.271
L19.CO(GT)         -0.024839         0.700835           -0.035           0.972
L19.NO2(GT)         0.001731         0.025844            0.067           0.947
L19.RH             -0.247871         0.116687           -2.124           0.034
L20.CO(GT)         -0.810637         0.701198           -1.156           0.248
L20.NO2(GT)         0.008661         0.025806            0.336           0.737
L20.RH              0.082541         0.116699            0.707           0.479
L21.CO(GT)         -0.268031         0.701722           -0.382           0.702
L21.NO2(GT)         0.006716         0.025794            0.260           0.795
L21.RH             -0.148403         0.116634           -1.272           0.203
L22.CO(GT)          0.551561         0.701384            0.786           0.432
L22.NO2(GT)         0.029776         0.025786            1.155           0.248
L22.RH              0.137999         0.116639            1.183           0.237
L23.CO(GT)          0.396023         0.701541            0.565           0.572
L23.NO2(GT)         0.141030         0.025786            5.469           0.000
L23.RH             -0.095500         0.116666           -0.819           0.413
L24.CO(GT)          0.359015         0.699771            0.513           0.608
L24.NO2(GT)         0.081228         0.025886            3.138           0.002
L24.RH              0.027786         0.116658            0.238           0.812
L25.CO(GT)         -1.491610         0.696641           -2.141           0.032
L25.NO2(GT)        -0.067370         0.025857           -2.605           0.009
L25.RH             -0.015628         0.114756           -0.136           0.892
L26.CO(GT)          0.583311         0.513221            1.137           0.256
L26.NO2(GT)        -0.100912         0.018999           -5.311           0.000
L26.RH              0.021928         0.073465            0.298           0.765
==============================================================================

Results for equation RH
==============================================================================
                 coefficient       std. error           t-stat            prob
------------------------------------------------------------------------------
const               1.708989         0.393111            4.347           0.000
L1.CO(GT)          -0.043025         0.108039           -0.398           0.690
L1.NO2(GT)         -0.010353         0.004005           -2.585           0.010
L1.RH               1.209022         0.015365           78.688           0.000
L2.CO(GT)          -0.237686         0.146959           -1.617           0.106
L2.NO2(GT)          0.010539         0.005448            1.935           0.053
L2.RH              -0.290694         0.024038          -12.093           0.000
L3.CO(GT)           0.154914         0.147492            1.050           0.294
L3.NO2(GT)         -0.002859         0.005450           -0.525           0.600
L3.RH               0.052192         0.024454            2.134           0.033
L4.CO(GT)           0.095980         0.147688            0.650           0.516
L4.NO2(GT)          0.003434         0.005426            0.633           0.527
L4.RH              -0.021893         0.024464           -0.895           0.371
L5.CO(GT)          -0.035423         0.147718           -0.240           0.810
L5.NO2(GT)         -0.002389         0.005426           -0.440           0.660
L5.RH               0.037941         0.024458            1.551           0.121
L6.CO(GT)          -0.040464         0.147815           -0.274           0.784
L6.NO2(GT)          0.001410         0.005429            0.260           0.795
L6.RH              -0.074252         0.024446           -3.037           0.002
L7.CO(GT)          -0.012767         0.147780           -0.086           0.931
L7.NO2(GT)          0.007474         0.005431            1.376           0.169
L7.RH               0.040664         0.024456            1.663           0.096
L8.CO(GT)           0.080673         0.147703            0.546           0.585
L8.NO2(GT)          0.001860         0.005439            0.342           0.732
L8.RH              -0.076619         0.024474           -3.131           0.002
L9.CO(GT)           0.047686         0.147745            0.323           0.747
L9.NO2(GT)         -0.004802         0.005442           -0.882           0.378
L9.RH               0.063403         0.024505            2.587           0.010
L10.CO(GT)          0.050348         0.147707            0.341           0.733
L10.NO2(GT)         0.004929         0.005445            0.905           0.365
L10.RH             -0.044706         0.024530           -1.822           0.068
L11.CO(GT)          0.117202         0.147691            0.794           0.427
L11.NO2(GT)        -0.002450         0.005446           -0.450           0.653
L11.RH              0.050771         0.024539            2.069           0.039
L12.CO(GT)          0.081156         0.147700            0.549           0.583
L12.NO2(GT)        -0.002690         0.005454           -0.493           0.622
L12.RH             -0.040349         0.024547           -1.644           0.100
L13.CO(GT)         -0.028788         0.147695           -0.195           0.845
L13.NO2(GT)         0.000768         0.005458            0.141           0.888
L13.RH              0.020414         0.024553            0.831           0.406
L14.CO(GT)         -0.078942         0.147729           -0.534           0.593
L14.NO2(GT)        -0.002734         0.005458           -0.501           0.616
L14.RH             -0.000282         0.024558           -0.011           0.991
L15.CO(GT)         -0.144161         0.147874           -0.975           0.330
L15.NO2(GT)         0.002442         0.005457            0.448           0.654
L15.RH              0.030373         0.024552            1.237           0.216
L16.CO(GT)         -0.001777         0.147633           -0.012           0.990
L16.NO2(GT)         0.002061         0.005445            0.379           0.705
L16.RH             -0.049313         0.024546           -2.009           0.045
L17.CO(GT)          0.046799         0.147569            0.317           0.751
L17.NO2(GT)        -0.007569         0.005443           -1.391           0.164
L17.RH              0.033906         0.024547            1.381           0.167
L18.CO(GT)         -0.080688         0.147444           -0.547           0.584
L18.NO2(GT)         0.006073         0.005440            1.116           0.264
L18.RH              0.006116         0.024533            0.249           0.803
L19.CO(GT)          0.269110         0.147207            1.828           0.068
L19.NO2(GT)        -0.004253         0.005429           -0.784           0.433
L19.RH             -0.011621         0.024510           -0.474           0.635
L20.CO(GT)         -0.324511         0.147284           -2.203           0.028
L20.NO2(GT)         0.000789         0.005420            0.145           0.884
L20.RH              0.041952         0.024512            1.711           0.087
L21.CO(GT)          0.177418         0.147394            1.204           0.229
L21.NO2(GT)         0.005398         0.005418            0.996           0.319
L21.RH             -0.037378         0.024499           -1.526           0.127
L22.CO(GT)          0.124107         0.147323            0.842           0.400
L22.NO2(GT)        -0.006135         0.005416           -1.133           0.257
L22.RH              0.041857         0.024500            1.708           0.088
L23.CO(GT)         -0.041076         0.147356           -0.279           0.780
L23.NO2(GT)        -0.005205         0.005416           -0.961           0.337
L23.RH              0.047960         0.024505            1.957           0.050
L24.CO(GT)          0.171936         0.146984            1.170           0.242
L24.NO2(GT)         0.003821         0.005437            0.703           0.482
L24.RH              0.010637         0.024503            0.434           0.664
L25.CO(GT)         -0.302108         0.146326           -2.065           0.039
L25.NO2(GT)         0.007458         0.005431            1.373           0.170
L25.RH             -0.034121         0.024104           -1.416           0.157
L26.CO(GT)          0.150202         0.107800            1.393           0.164
L26.NO2(GT)        -0.008683         0.003991           -2.176           0.030
L26.RH             -0.041247         0.015431           -2.673           0.008
==============================================================================

Correlation matrix of residuals
             CO(GT)   NO2(GT)        RH
CO(GT)     1.000000  0.606004  0.150771
NO2(GT)    0.606004  1.000000  0.078143
RH         0.150771  0.078143  1.000000

4. Get forecast

forecast_var = pd.DataFrame(var_model.forecast(train_df.values, 
                                              steps=forecast_length), 
                           columns=train_df.columns, 
                           index=test_df.index)
forecast_var = forecast_var.rename(columns={c: c+'-VAR' for c in forecast_var.columns})
for c in train_df.columns:
    fig, ax = plt.subplots(figsize=[15, 2])
    pd.concat([train_df[[c]], forecast_var[[c+'-VAR']]], axis=1).plot(ax=ax)
    plt.xlim(left=pd.to_datetime('2005-03-01'))
    plt.xlabel('')
#     plt.tight_layout()
    plt.show()
../_images/03_VectorAutoregressiveMethods_77_0.png ../_images/03_VectorAutoregressiveMethods_77_1.png ../_images/03_VectorAutoregressiveMethods_77_2.png

Performance Evaluation: Comparison with ARIMA model

When using ARIMA, we treat each variable as a univariate time series, and we perform the forecasting for each variable: 1 for CO, 1 for NO2, and 1 for RH

# For model order selection, refer to Chapter 1
selected_order = {'CO(GT)': [(0, 1, 0)],
                  'NO2(GT)': [(0, 1, 0)],
                  'RH': [(3, 1, 1)]}
%%time
forecast_arima = {}
for c in cols:
    forecast_arima[c+'-ARIMA'] = utils.forecast_arima(train_df[c].values, 
                                                      test_df[c].values, 
                                                      order=selected_order[c][0])
forecast_arima = pd.DataFrame(forecast_arima, index=forecast_var.index)
forecast_arima.head()
CPU times: user 4.32 s, sys: 268 ms, total: 4.58 s
Wall time: 1.48 s
CO(GT)-ARIMA NO2(GT)-ARIMA RH-ARIMA
Date_Time
2005-04-03 15:00:00 0.999865 87.002935 14.502893
2005-04-03 16:00:00 0.999729 87.005870 16.743239
2005-04-03 17:00:00 0.999594 87.008806 19.253414
2005-04-03 18:00:00 0.999458 87.011741 21.715709
2005-04-03 19:00:00 0.999323 87.014676 23.974483
forecasts = pd.concat([forecast_arima, forecast_var], axis=1)
for c in cols:
    fig, ax = utils.plot_forecasts_static(train_df=train_df,
                                          test_df=test_df, 
                                          forecast_df=forecasts, 
                                          column_name=c,
                                          min_train_date='2005-04-01', 
                                          suffix=['-VAR', '-ARIMA'],
                                          title=c)
../_images/03_VectorAutoregressiveMethods_82_0.png ../_images/03_VectorAutoregressiveMethods_82_1.png ../_images/03_VectorAutoregressiveMethods_82_2.png

Performance Metrics

pd.concat([utils.test_performance_metrics(test_df, forecast_var, suffix='-VAR'),
           utils.test_performance_metrics(test_df, forecast_arima, suffix='-ARIMA')], axis=1)
CO(GT)-VAR NO2(GT)-VAR RH-VAR CO(GT)-ARIMA NO2(GT)-ARIMA RH-ARIMA
MAE 0.685005 31.687612 9.748883 1.057097 59.283376 16.239017
MSE 1.180781 1227.046604 111.996306 2.163021 4404.904459 333.622140
MAPE 43.508500 29.811066 35.565987 56.960617 46.190671 51.199023

Observation/s:

  • MAE: VAR forecasts have lower errors than ARIMA forecasts for CO and NO2 but not in RH.

  • MSE: VAR forecasts have lower errors for all variables (CO, NO2 and RH).

  • MAPE: VAR forecasts have lower errors for all variables (CO, NO2 and RH).

  • Training time is significantly reduced when using VAR compared to ARIMA (<0.1s run time for VAR while ~20s for ARIMA)

Structural VAR Analysis

In addition to forecasting, VAR models are also used for structural inference and policy analysis. In macroeconomics, this structural analysis has been extensively employed to investigate the transmission mechanisms of macroeconomic shocks (e.g., monetary shocks, financial shocks) and test economic theories. There are particular assumptions imposed about the causal structure of the dataset, and the resulting causal impacts of unexpected shocks (also called innovations or perturbations) to a specific variable on the different variables in the model are summarized. In this section, we cover two of the common methods in summarizing the effects of these causal impacts: (1) impulse response functions, and (2): forecast error variance decompositions.

Impulse Response Function (IRF)

  • Coefficients of the VAR models are often difficult to interpret so practitioners often estimate the impulse response function.

  • IRFs trace out the time path of the effects of an exogenous shock to one (or more) of the endogenous variables on some or all of the other variables in a VAR system.

  • IRF traces out the response of the dependent variable of the VAR system to shocks (also called innovations or impulses) in the error terms.

IRF in the VAR system for Air Quality

Let \(y_{1,t}\), \(y_{2,t}\) and \(y_{3,t}\) be the time series corresponding to CO signal, NO2 signal, and RH signal, respectively. Consider the moving average representation of the system shown below:

Let ../_images/irf_eq.png

Suppose \(u_1\) in the first equation increases by a value of one standard deviation.

  • This shock will change \(y_1\) in the current as well as the future periods.

  • This shock will also have an impact on \(y_2\) and \(y_3\).

Suppose \(u_2\) in the second equation increases by a value of one standard deviation.

  • This shock will change \(y_2\) in the current as well as the future periods.

  • This shock will also have an impact on \(y_1\) and \(y_3\).

Suppose \(u_3\) in the third equation increases by a value of one standard deviation.

  • This shock will change \(y_3\) in the current as well as the future periods.

  • This shock will also have an impact on \(y_1\) and \(y_2\).

irf = var_model.irf(periods=8)
ax = irf.plot(orth=True, 
              subplot_params={'fontsize': 10})
../_images/03_VectorAutoregressiveMethods_91_0.png

Observation/s:

Effects of exogenous perturbation/shocks (1SD) of a variable on itself:

  • CO \(\rightarrow\) CO: A shock in the value of CO has a larger effect CO in the early hours but this decays over time.

  • NO2 \(\rightarrow\) NO2: A shock in the value of NO2 has a larger effect NO2 in the early hours but this decays over time.

  • RH \(\rightarrow\) RH: A shock in the value of RH has a largest effect in RH after 1 hour and this effect decays over time.

Effects of exogenous perturbation/shocks of a variable on another:

  • CO \(\rightarrow\) NO2: A shock in the value of CO has a largest effect in NO2 after 1 hour and this effect decays over time.

  • CO \(\rightarrow\) RH: A shock in the value of CO has an immediate effect in the value of RH. However, the effect decreases immediately after an hour, and the value seems to stay at around 0.2.

  • NO2 \(\rightarrow\) CO: A shock in NO2 only causes a small effect in the values of CO. There seems to be a delayed effect, peaking after 3 hours, but the magnitude is still small.

  • NO2 \(\rightarrow\) RH: A shock in NO2 causes a small (negative) effect in the values of RH. The magnitude seems to decline further after 6 hours. The value of the IRF reaches zero in about 7 hours.

  • RH \(\rightarrow\) CO: A shock in RH only causes a small effect in the values of CO.

  • RH \(\rightarrow\) NO2: A shock in the value of RH has a largest effect in NO2 after 1 hour and this effect decays over time. The value of the IRF reaches zero after 6 hours.

Forecast Error Variance Decomposition (FEVD)

  • FEVD indicates the amount of information each variable contributes to the other variables in the autoregression

  • While impulse response functions trace the effects of a shock to one endogenous variable on to the other variables in the VAR, variance decomposition separates the variation in an endogenous variable into the component shocks to the VAR.

  • It determines how much of the forecast error variance of each of the variables can be explained by exogenous shocks to the other variables.

fevd = var_model.fevd(8)
ax = fevd.plot(figsize=(10, 8))
plt.show()
../_images/03_VectorAutoregressiveMethods_94_0.png

Observation/s:

  • For CO, the variance is mostly explained by exogenous shocks to CO. This decreases over time but only by a small amount.

  • For NO2, the variance is mostly explained by exogenous shocks to NO2 and CO.

  • For RH, the variance is mostly explained by exogenous shocks to RH. Over time, the contribution of the exogenous shocks to CO increases.

Example 3: Forecasting the Jena climate data

We try to forecast the Jena climate data using the method outlined above. We will train the VAR model using hourly weather measurements from January 1, 2019 (00:10) up to December 29, 2014 (18:10). The performance of the model will be evaluated on the test set which contains data from December 29, 2014 (19:10) to December 31, 2014 (23:20) which is equivalent to 17,523 data points for each of the variables.

Load dataset

train_df = pd.read_csv('../data/train_series_datetime.csv',index_col=0).set_index('Date Time')
val_df = pd.read_csv('../data/val_series_datetime.csv',index_col=0).set_index('Date Time')
test_df = pd.read_csv('../data/test_series_datetime.csv',index_col=0).set_index('Date Time')
train_df.index = pd.to_datetime(train_df.index)
val_df.index = pd.to_datetime(val_df.index)
test_df.index = pd.to_datetime(test_df.index)
train_val_df = pd.concat([train_df, val_df])

Check stationarity of each variable using ADF test

%%time
test_stat, p_val = [], []
cv_1pct, cv_5pct, cv_10pct = [], [], []
for c in train_val_df.columns: 
    adf_res = adfuller(train_val_df[c].dropna())
    test_stat.append(adf_res[0])
    p_val.append(adf_res[1])
    cv_1pct.append(adf_res[4]['1%'])
    cv_5pct.append(adf_res[4]['5%'])
    cv_10pct.append(adf_res[4]['10%'])
adf_res_df = pd.DataFrame({'Test statistic': test_stat, 
                           'p-value': p_val, 
                           'Critical value - 1%': cv_1pct,
                           'Critical value - 5%': cv_5pct,
                           'Critical value - 10%': cv_10pct}, 
                         index=train_df.columns).T
adf_res_df.round(4)
CPU times: user 3min 11s, sys: 25.8 s, total: 3min 37s
Wall time: 57.2 s
p (mbar) T (degC) Tpot (K) Tdew (degC) rh (%) VPmax (mbar) VPact (mbar) VPdef (mbar) sh (g/kg) H2OC (mmol/mol) rho (g/m**3) wv (m/s) max. wv (m/s) wd (deg)
Test statistic -15.5867 -7.9586 -8.3354 -8.5750 -17.7069 -9.1945 -9.0103 -13.5492 -9.0827 -9.0709 -9.3980 -24.2424 -24.3052 -19.8394
p-value 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Critical value - 1% -3.4305 -3.4305 -3.4305 -3.4305 -3.4305 -3.4305 -3.4305 -3.4305 -3.4305 -3.4305 -3.4305 -3.4305 -3.4305 -3.4305
Critical value - 5% -2.8616 -2.8616 -2.8616 -2.8616 -2.8616 -2.8616 -2.8616 -2.8616 -2.8616 -2.8616 -2.8616 -2.8616 -2.8616 -2.8616
Critical value - 10% -2.5668 -2.5668 -2.5668 -2.5668 -2.5668 -2.5668 -2.5668 -2.5668 -2.5668 -2.5668 -2.5668 -2.5668 -2.5668 -2.5668
((adf_res_df.loc['Test statistic']< adf_res_df.loc['Critical value - 1%']) & 
(adf_res_df.loc['Test statistic']< adf_res_df.loc['Critical value - 5%']) &
( adf_res_df.loc['Test statistic']< adf_res_df.loc['Critical value - 10%']))
p (mbar)           True
T (degC)           True
Tpot (K)           True
Tdew (degC)        True
rh (%)             True
VPmax (mbar)       True
VPact (mbar)       True
VPdef (mbar)       True
sh (g/kg)          True
H2OC (mmol/mol)    True
rho (g/m**3)       True
wv (m/s)           True
max. wv (m/s)      True
wd (deg)           True
dtype: bool

Observation/s:

From the values above, all the components of the Jena climate data are stationary, so we’ll use all the variables in our VAR model.

Select order p

%%time
aic, bic, fpe, hqic = [], [], [], []
model = VAR(train_val_df)
p = np.arange(1,60)
for i in p:
    result = model.fit(i)
    aic.append(result.aic)
    bic.append(result.bic)
    fpe.append(result.fpe)
    hqic.append(result.hqic)
lags_metrics_df = pd.DataFrame({'AIC': aic, 
                                'BIC': bic, 
                                'HQIC': hqic,
                                'FPE': fpe}, 
                               index=p)    
fig, ax = plt.subplots(1, 4, figsize=(15, 3), sharex=True)
lags_metrics_df.plot(subplots=True, ax=ax, marker='o')
plt.tight_layout()
CPU times: user 2min 57s, sys: 14.3 s, total: 3min 11s
Wall time: 1min
../_images/03_VectorAutoregressiveMethods_106_1.png
lags_metrics_df.idxmin()
AIC     51
BIC      5
HQIC    26
FPE     51
dtype: int64
lags_metrics_df[25:].idxmin()
AIC     51
BIC     26
HQIC    26
FPE     51
dtype: int64

Observation/s:

The model order that resulted to the minimum value varies for each information criteria, showing no clear minimum. We see an elbow at \(p=5\), but if we look at AIC and HQIC we’re observing another elbow/local minimum at \(p=26\). So, we choose \(p=26\) as our lag length.

Train VAR model using the training and validation data

%%time
var_model = model.fit(26)
CPU times: user 1.38 s, sys: 151 ms, total: 1.53 s
Wall time: 631 ms
test_index = np.arange(len(test_df)- (len(test_df))%24).reshape((-1, 24))
fit_index = [test_index[:i].flatten() for i in range(len(test_index))]

Forecast 24-hour weather measurements and evaluate performance on test set

forecasts_df = []
for n in range(len(fit_index)):
    forecast_var = pd.DataFrame(var_model.forecast(
        pd.concat([train_val_df, test_df.iloc[fit_index[n]]]).values, steps=24), 
                                columns=train_val_df.columns, 
                                index=test_df.iloc[test_index[n]].index)
    forecasts_df.append(forecast_var)
forecasts_df = pd.concat(forecasts_df)
forecasts_df.columns = forecasts_df.columns+['-VAR']
for c in train_val_df.columns:
    fig, ax = plt.subplots(figsize=[14, 2])
    train_val_df[c].plot(ax=ax)
    forecasts_df[[c+'-VAR']].plot(ax=ax)
    plt.ylim(train_val_df[c].min(), train_val_df[c].max())
    plt.legend(loc=2)
    plt.tight_layout()
    plt.show()
../_images/03_VectorAutoregressiveMethods_115_0.png ../_images/03_VectorAutoregressiveMethods_115_1.png ../_images/03_VectorAutoregressiveMethods_115_2.png ../_images/03_VectorAutoregressiveMethods_115_3.png ../_images/03_VectorAutoregressiveMethods_115_4.png ../_images/03_VectorAutoregressiveMethods_115_5.png ../_images/03_VectorAutoregressiveMethods_115_6.png ../_images/03_VectorAutoregressiveMethods_115_7.png ../_images/03_VectorAutoregressiveMethods_115_8.png ../_images/03_VectorAutoregressiveMethods_115_9.png ../_images/03_VectorAutoregressiveMethods_115_10.png ../_images/03_VectorAutoregressiveMethods_115_11.png ../_images/03_VectorAutoregressiveMethods_115_12.png ../_images/03_VectorAutoregressiveMethods_115_13.png

Evaluate forecasts

utils.test_performance_metrics(test_df.loc[forecasts_df.index], forecasts_df, suffix='-VAR').loc[['MAE', 'MSE']]
p (mbar)-VAR T (degC)-VAR Tpot (K)-VAR Tdew (degC)-VAR rh (%)-VAR VPmax (mbar)-VAR VPact (mbar)-VAR VPdef (mbar)-VAR sh (g/kg)-VAR H2OC (mmol/mol)-VAR rho (g/m**3)-VAR wv (m/s)-VAR max. wv (m/s)-VAR wd (deg)-VAR
MAE 2.681284 2.542025 2.603187 1.797102 10.899944 2.526237 1.171625 2.379343 0.744665 1.186948 12.164486 3.136091 4.414089 66.420597
MSE 238.321753 652.733627 618.458262 102.257346 9581.665059 540.666240 39.837241 490.874048 15.612470 39.678736 9986.126326 17352.964659 23404.318476 87581.698748
T_MAEs = []
for n in range(len(test_index)):
    index = forecasts_df.iloc[test_index[n]].index
    T_MAEs.append(utils.mean_absolute_error(test_df.loc[index, 'T (degC)'].values, 
                                            forecasts_df.loc[index, 'T (degC)-VAR'].values))
print(f'VAR(26) MAE: {np.mean(T_MAEs)}')
VAR(26) MAE: 2.5420250169753875

Observation/s: The VAR(26) model outperformed the naive (MAE= 3.18), seasonal naive (MAE= 2.61) and ARIMA (MAE= 3.19) models.

Forecast 24 hours beyond test set

all_data = pd.concat([train_df, val_df, test_df])
future_index = [all_data.index[-1]+pd.Timedelta(f'{h} hour') for h in np.arange(24)+1]
forecast_var_future = pd.DataFrame(var_model.forecast(all_data.values, 
                                                      steps=24), 
                                   columns=all_data.columns, 
                                   index=future_index)

forecast_var_future.columns = forecast_var_future.columns+['-VAR']
for c in train_val_df.columns:
    fig, ax = plt.subplots(figsize=[14, 2])
    all_data.iloc[-500:][c].plot(ax=ax)
    forecast_var_future[[c+'-VAR']].plot(ax=ax)
    plt.ylim(train_val_df[c].min(), train_val_df[c].max())
    plt.legend(loc=2)
    plt.tight_layout()
    plt.show()
../_images/03_VectorAutoregressiveMethods_125_0.png ../_images/03_VectorAutoregressiveMethods_125_1.png ../_images/03_VectorAutoregressiveMethods_125_2.png ../_images/03_VectorAutoregressiveMethods_125_3.png ../_images/03_VectorAutoregressiveMethods_125_4.png ../_images/03_VectorAutoregressiveMethods_125_5.png ../_images/03_VectorAutoregressiveMethods_125_6.png ../_images/03_VectorAutoregressiveMethods_125_7.png ../_images/03_VectorAutoregressiveMethods_125_8.png ../_images/03_VectorAutoregressiveMethods_125_9.png ../_images/03_VectorAutoregressiveMethods_125_10.png ../_images/03_VectorAutoregressiveMethods_125_11.png ../_images/03_VectorAutoregressiveMethods_125_12.png ../_images/03_VectorAutoregressiveMethods_125_13.png

Summary

  • VAR methods are useful when dealing with multivariate time series, as they allow us to use the relationship between the different variable to forecast.

  • These models allow us to forecast the different variables simultaneously, with the added benefit of easy (only 1 hyperparameter) and fast training.

  • Using the fitted VAR model, we can also explain the relationship between variables, and how the perturbation in one variable affects the others by getting the impulse response functions and the variance decomposition of the forecasts.

  • However, the application of these models is limited due to the stationarity requirement for ALL the variables in the multivariate time series. This method won’t work well if there is at least one variable that’s non-stationary. When dealing with non-stationary multivariate time series, one can explore the use of vector error correction models (VECM).

Preview to the Next Chapter

In the next chapter, we further extend the use of VAR models to explain the relationships between variables in a multivariate time series using Granger causality, which is one of the most common ways to describe causal mechanisms in time series data.

References

Main references

  • Lütkepohl, H. (2005). New introduction to multiple time series analysis. Berlin: Springer.

  • Kilian, L., & Lütkepohl, H. (2018). Structural vector autoregressive analysis. Cambridge: Cambridge University Press.

Supplementary references are listed here.