Vector autoregressive models

class: center, middle, inverse, title-slide

# Vector autoregressive models
### Kevin Kotzé

---

layout: true
background-image: url(image/tsm-letter.svg)
background-position: 3% 96%
background-size: 4%

---
# Contents

1. Introduction
1. Basic VAR model
1. Stability of VAR model
1. Moving average representation
1. Moments of the VAR process
1. Estimating a VAR model
1. VAR forecasts
1. Granger causality
1. Conclusion

---
# Introduction

- VAR models are widely used in time series research:
    - Examine the dynamic relationships that exist between variables
    - Important forecasting tools that are used by economic & policy-making institutions
- Most of the concepts in this lecture are multivariate extensions of the tools and concepts that apply to autoregressive models
- This lecture introduces some of the key ideas and methods used in VAR analysis, where we discuss:
    - stability properties and moving average representation
    - issues related to specification, estimation and forecasting
- Granger causality

---
# Notation

- To describe the use of multivariate techniques, we need to introduce new notation:
    - Small letters denote a `\((K \times 1)\)` vector of random variables, where
`\begin{eqnarray}
\boldsymbol{y}_{t}=(y_{1,t}, \ldots ,y_{K,t})^{^{\prime }}
\end{eqnarray}`
- The VAR model of order `\(p\)` can then be written as,
`\begin{eqnarray}
\boldsymbol{y}_t = A_1 \boldsymbol{y}_{t-1} + \ldots + A_p \boldsymbol{y}_{t-p} + CD_t + \boldsymbol{u}_t
\end{eqnarray}`
- where `\(A_j\)` is a `\((K\times K)\)` coefficient matrix, for `\(j=\{ 1, \ldots , p\}\)`
- `\(C\)` is the coefficient matrix for deterministic regressors
- `\(D_t\)` is the matrix for deterministic regressors
- `\(\boldsymbol{u}_t\)` is a `\((K\times 1)\)` dimension vector of error terms

---
# Notation

- The vector of error terms are assumed to be white noise
`\begin{eqnarray}
\mathbb{E} \left[ \boldsymbol{u}_t \right] &=&0 \\
\mathbb{E} \left[ \boldsymbol{u}_t \boldsymbol{u}_t^\prime \right] &=& \Sigma_{\boldsymbol{u}} \; \text{which is positive definite}
\end{eqnarray}`
- This VAR is termed a reduced-form representation, which differs to the structural VAR (SVAR) that is discussed later
- Model relates the `\(k\)`'th variable in the vector `\(\boldsymbol{y}_t\)` to past values of itself and all other variables in the system

---
# Basic VAR model

- For simplicity, assume `\(K=2\)`, and `\(p=1\)`,
`\begin{eqnarray}
\boldsymbol{y}_{t}= A_{1} \boldsymbol{y}_{t-1} + \boldsymbol{u}_{t}
\end{eqnarray}`
- where `\(y_{t}\)`, `\(\mu\)`, `\(A_{1}\)`, and `\(\boldsymbol{u}_{t}\)` are given as,
`\begin{eqnarray}
\boldsymbol{y}_{t}=\left[
\begin{array}{c} y_{1,t} \\ y_{2,t} \end{array}
\right] , A_{1}=\left[
\begin{array}{cc} \alpha_{11} & \alpha_{12} \\ \alpha_{21} & \alpha_{22} \end{array}
\right] \textrm{ and } \boldsymbol{u}_{t}=\left[
\begin{array}{c} u_{1,t} \\ u_{2,t} \end{array}
\right]
\end{eqnarray}`
- For example, assume the elements of `\(A_{1}\)` are given as,
`\begin{eqnarray}
\left[
\begin{array}{c} y_{1,t} \\ y_{2,t} \end{array}
\right] = \left[
\begin{array}{cc} 0.5 & 0 \\ 1 & 0.2 \end{array}
\right] \left[
\begin{array}{c} y_{1,t-1} \\ y_{2,t-1} \end{array}
\right] +\left[
\begin{array}{c} u_{1,t} \\ u_{2,t} \end{array}
\right]
\end{eqnarray}`
- where after some matrix manipulations,
`\begin{eqnarray}
y_{1,t} &=&  0.5y_{1,t-1}+u_{1,t} \\
y_{2,t} &=& 1y_{1,t-1}+0.2y_{2,t-1}+u_{2,t}
\end{eqnarray}`

---
# Basic VAR model

- The above model suggests:
    - `\(y_{2,t}\)` depends on past values of itself and past values of `\(y_{1,t}\)`
    - `\(y_{1,t}\)` only depends on past values of itself
- The variables that are to be included will typically depend on the purpose of the study
- Usually include variables that may have various dynamic interactions or a perceived causal relationship

---
# The companion form

- Useful to express the `\(VAR(p)\)` as a `\(VAR(1)\)` in the companion form,
`\begin{eqnarray}
Z_{t}=\Gamma_{0}+\Gamma_{1}Z_{t-1}+\Upsilon_{t}
\end{eqnarray}`
- where we have,
`\begin{eqnarray}
Z_{t}=\left[
\begin{array}{c} \boldsymbol{y}_{t} \\ \boldsymbol{y}_{t-1} \\ \vdots \\ \boldsymbol{y}_{t-p+1} \end{array}
\right] , \hspace{1cm} \Gamma_0=\left[
\begin{array}{c} \mu  \\ 0 \\ \vdots \\ 0 \end{array}
\right] , \hspace{1cm} \Upsilon_{t} =\left[
\begin{array}{c} \boldsymbol{u}_{t}  \\ 0 \\ \vdots \\ 0 \end{array}
\right]
\end{eqnarray}`

---
# The companion form

- So that the matrix notation is
`\begin{eqnarray}
\left[
\begin{array}{c} \boldsymbol{y}_{t} \\ \boldsymbol{y}_{t-1} \\ \boldsymbol{y}_{t-2} \\ \vdots \\ \boldsymbol{y}_{t-p+1} \end{array}
\right] =\left[
\begin{array}{c} \mu \\ 0 \\ 0 \\ \vdots \\ 0 \end{array}
\right] +\left[
\begin{array}{ccccccc} A_{1} & A_{2} & \cdots & A_{p-1} & A_{p} \\ I & 0 & \cdots & 0 & 0 \\ 0 & I & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & I & 0 \end{array}
\right]
\left[
\begin{array}{c} \boldsymbol{y}_{t-1} \\ \boldsymbol{y}_{t-2} \\ \boldsymbol{y}_{t-3} \\ \vdots \\ \boldsymbol{y}_{t-p} \end{array}
\right] +
\left[
\begin{array}{c} \boldsymbol{u}_{t} \\ 0 \\ 0 \\ \vdots \\ 0 \end{array}
\right]
\end{eqnarray}`
- where the vectors `\(Z_{t}\)`, `\(\Gamma_{0}\)` and `\(\Upsilon_{t}\)` are `\(Kp\times 1\)`
- `\(A_{j}\)` for `\(j=1,\ldots , p\)` is `\(K\times K\)`, and
- `\(\Gamma_{1}\)` is `\(Kp\times Kp\)`
- In this case `\(\Gamma_{1}\)` is called the companion-form matrix

---
# Stability of VAR model

- The VAR is covariance-stationary when the effect of the shocks, `\(\boldsymbol{u}_t\)`, dissipate
- This occurs when the eigenvalues of the companion form matrix are all less than one in absolute value
- The eigenvalues of the matrix `\(\Gamma_1\)` are represented by `\(\lambda\)` in the expression,
`\begin{eqnarray}
|\Gamma_{1}-\lambda I|=0
\end{eqnarray}`
- To derive the eigenvalues in our bivariate `\(VAR(1)\)` example,
`\begin{eqnarray}
\det \left[ \left[
\begin{array}{cc} 0.5 & 0 \\ 1 & 0.2 \end{array}
\right] -\lambda \left[
\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}
\right] \right]  &=&\det \left[ \left[
\begin{array}{cc} 0.5-\lambda  & 0 \\ 1 & 0.2-\lambda \end{array}
\right] \right] \\
(0.5-\lambda )(0.2-\lambda ) &=&0
\end{eqnarray}`
- Hence,
`\begin{eqnarray}
\lambda_{1} &=& 0.5, \hspace{2ex} \lambda_{2}=0.2
\end{eqnarray}`

---
# Stability of VAR model

- Certain researchers consider the values of the *characteristic roots*, which may be defined as `\(z\)` in the expression
`\begin{eqnarray}
|I-\Gamma_{1}z|
\end{eqnarray}`
- where the interpretation is reversed, as a stable stochastic process has characteristic roots that lie outside the unit circle
- The interested reader may wish to consult Hamilton (1994)

---
# Simulating stable VAR processes

- We can simulate the above bivariate `\(VAR(1)\)` with `\(y_{k,0}=0\)`, `\(\mu_{k}=1\)` for `\(k=[1,2]\)` and
`\begin{eqnarray}
\boldsymbol{u}_t \sim \mathcal{N}\left( \left[
\begin{array}{c} 0 \\ 0 \end{array}
\right] ,\left[
\begin{array}{cc} 1 & 0.2 \\ 0.2 & 1 \end{array}
\right] \right)
\end{eqnarray}`
- Note that the processes fluctuate around a constant mean & their variability
does not appear to change with time

---
background-image: url(image/sim_var.svg)
background-position: top
background-size: 90% 90%

class: clear, center, bottom

Figure : Simulated VAR processes

---
# Wold representation

- Just as the stable `\(AR(p)\)` model has a MA representation, the stable `\(VAR(p)\)` has a VMA representation - termed the Wold decomposition
- Theorem states that every covariance-stationary time series can be written as the sum of two uncorrelated processes:
    - deterministic component, `\(\kappa_{t}\)`, (which could be the mean)
    - infinite moving average representation of `\(\sum_{j=0}^{\infty }\theta_{j} \varepsilon_{t-j}\)`
- Hence,
`\begin{eqnarray}
y_{t}=\sum_{j=0}^{\infty }\theta_{j} \varepsilon_{t-j}+\kappa_{t}
\end{eqnarray}`
- where we assume `\(\theta_{0}=1\)`
- `\(\sum_{j=0}^{\infty }|\theta_{j}|<\infty\)`
- `\(\varepsilon_{t}\)` is white noise

---
# Wold representation

- This would involve fitting an infinite number of parameters `\(( \theta_{1}, \theta_{2}, \theta_{3}, \ldots )\)` to the data
- With a finite number of observations, this is not possible
- One can approximate `\(\theta (L)\)` by using models that have a finite number of parameters
- Since we can write a `\(VAR(p)\)` as a `\(VAR(1)\)` model using the companion form, consider the example,
`\begin{eqnarray}
\boldsymbol{y}_{t}=\mu +A_{1}\boldsymbol{y}_{t-1}+\boldsymbol{u}_{t}
\end{eqnarray}`
- Using the lag operator,
`\begin{eqnarray}
(I-A_{1}L)\boldsymbol{y}_{t}=\mu +\boldsymbol{u}_{t}
\end{eqnarray}`

---
# Wold representation

- Using the expression, `\((I-A_{1}L)=A(L)\)` we can write,
`\begin{eqnarray}
A(L)\boldsymbol{y}_{t}=\mu +\boldsymbol{u}_{t}
\end{eqnarray}`
- Multiplying with `\(A(L)^{-1}\)` we get the VMA representation,
`\begin{eqnarray}
\boldsymbol{y}_{t} &=&A(L)^{-1}\mu +A(L)^{-1}\boldsymbol{u}_{t} \\
&=&B(L)\mu +B(L) \boldsymbol{u}_t \\
&=&\varphi+\sum_{j=0}^{\infty }B_{j}\boldsymbol{u}_{t-j}
\end{eqnarray}`

---
# Wold representation

- Where we have used the geometric rule
`\begin{eqnarray}
A(L)^{-1}=(I-A_{1}L)^{-1}=\sum_{j=0}^{p}A_{1}^{j}L^{j}\equiv B(L)=\sum_{j=0}^{\infty }B_{j}L^{j}
\end{eqnarray}`
- with `\(B_{0}=I\)` and `\(\varphi=\left( \sum\limits_{j=0}^{\infty }B_{j}\right) \mu\)`

---
# Finding the MA coefficients

- The MA coefficients, `\(B_{j}\)`, are derived from the relationship `\(I=B(L)A(L)\)`
- Since, `\(A(L)^{-1}A(L)=I\)` and `\(A(L)^{-1}= B(L)\)`. Therefore,
`\begin{eqnarray}
I&=&B(L)A(L) \\
I &=&(B_{0}+B_{1}L+B_{2}L^{2}+ \ldots )(I-A_{1}L-A_{2}L^{2}- \ldots - A_{p}L^{p}) \\
&=&[B_{0}+B_{1}L+B_{2}L^{2}+ \ldots ] \\
&& -[B_{0}A_{1}L+B_{1}A_{1}L^{2}+B_{2}A_{1}L^{3}+ \ldots ] \\
&& -[B_{0}A_{2}L^{2}+B_{1}A_{2}L^{3}+B_{2}A_{2}L^{4}+ \ldots] - \ldots \\
&=&B_{0}+(B_{1}-B_{0}A_{1})L+(B_{2}-B_{1}A_{1}-B_{0}A_{2})L^{2}+ \ldots \\
&& +\left(B_{p}-\sum_{j=1}^{i}B_{p-j}A_{j}\right) L^{p}+ \ldots
\end{eqnarray}`

---
# Finding the MA coefficients

- Solving for the relevant lags (noting that `\(A_{1}=0\)` for  `\(j>p\)`), we get,
`\begin{eqnarray}
B_{0} &=&I   \\
B_{1} &=&B_{0}A_{1} \\
B_{2} &=&B_{1}A_{1}+B_{0}A_{2} \\
\vdots & & \vdots \\
B_{i} &=&\sum_{j=1}^{i}B_{i-j}A_{j}\hspace{1cm}\text{for }i=1,2, \ldots
\end{eqnarray}`
- Hence, the `\(B_{j}\)` parameters can be computed recursively