class: center, middle, inverse, title-slide # Vector autoregressive models ### Kevin Kotzé --- layout: true background-image: url(image/tsm-letter.svg) background-position: 3% 96% background-size: 4% --- # Contents 1. Introduction 1. Basic VAR model 1. Stability of VAR model 1. Moving average representation 1. Moments of the VAR process 1. Estimating a VAR model 1. VAR forecasts 1. Granger causality 1. Conclusion --- # Introduction - VAR models are widely used in time series research: - Examine the dynamic relationships that exist between variables - Important forecasting tools that are used by economic & policy-making institutions - Most of the concepts in this lecture are multivariate extensions of the tools and concepts that apply to autoregressive models - This lecture introduces some of the key ideas and methods used in VAR analysis, where we discuss: - stability properties and moving average representation - issues related to specification, estimation and forecasting - Granger causality --- # Notation - To describe the use of multivariate techniques, we need to introduce new notation: - Small letters denote a `\((K \times 1)\)` vector of random variables, where `\begin{eqnarray} \boldsymbol{y}_{t}=(y_{1,t}, \ldots ,y_{K,t})^{^{\prime }} \end{eqnarray}` - The VAR model of order `\(p\)` can then be written as, `\begin{eqnarray} \boldsymbol{y}_t = A_1 \boldsymbol{y}_{t-1} + \ldots + A_p \boldsymbol{y}_{t-p} + CD_t + \boldsymbol{u}_t \end{eqnarray}` - where `\(A_j\)` is a `\((K\times K)\)` coefficient matrix, for `\(j=\{ 1, \ldots , p\}\)` - `\(C\)` is the coefficient matrix for deterministic regressors - `\(D_t\)` is the matrix for deterministic regressors - `\(\boldsymbol{u}_t\)` is a `\((K\times 1)\)` dimension vector of error terms --- # Notation - The vector of error terms are assumed to be white noise `\begin{eqnarray} \mathbb{E} \left[ \boldsymbol{u}_t \right] &=&0 \\ \mathbb{E} \left[ \boldsymbol{u}_t \boldsymbol{u}_t^\prime \right] &=& \Sigma_{\boldsymbol{u}} \; \text{which is positive definite} \end{eqnarray}` - This VAR is termed a reduced-form representation, which differs to the structural VAR (SVAR) that is discussed later - Model relates the `\(k\)`'th variable in the vector `\(\boldsymbol{y}_t\)` to past values of itself and all other variables in the system --- # Basic VAR model - For simplicity, assume `\(K=2\)`, and `\(p=1\)`, `\begin{eqnarray} \boldsymbol{y}_{t}= A_{1} \boldsymbol{y}_{t-1} + \boldsymbol{u}_{t} \end{eqnarray}` - where `\(y_{t}\)`, `\(\mu\)`, `\(A_{1}\)`, and `\(\boldsymbol{u}_{t}\)` are given as, `\begin{eqnarray} \boldsymbol{y}_{t}=\left[ \begin{array}{c} y_{1,t} \\ y_{2,t} \end{array} \right] , A_{1}=\left[ \begin{array}{cc} \alpha_{11} & \alpha_{12} \\ \alpha_{21} & \alpha_{22} \end{array} \right] \textrm{ and } \boldsymbol{u}_{t}=\left[ \begin{array}{c} u_{1,t} \\ u_{2,t} \end{array} \right] \end{eqnarray}` - For example, assume the elements of `\(A_{1}\)` are given as, `\begin{eqnarray} \left[ \begin{array}{c} y_{1,t} \\ y_{2,t} \end{array} \right] = \left[ \begin{array}{cc} 0.5 & 0 \\ 1 & 0.2 \end{array} \right] \left[ \begin{array}{c} y_{1,t-1} \\ y_{2,t-1} \end{array} \right] +\left[ \begin{array}{c} u_{1,t} \\ u_{2,t} \end{array} \right] \end{eqnarray}` - where after some matrix manipulations, `\begin{eqnarray} y_{1,t} &=& 0.5y_{1,t-1}+u_{1,t} \\ y_{2,t} &=& 1y_{1,t-1}+0.2y_{2,t-1}+u_{2,t} \end{eqnarray}` --- # Basic VAR model - The above model suggests: - `\(y_{2,t}\)` depends on past values of itself and past values of `\(y_{1,t}\)` - `\(y_{1,t}\)` only depends on past values of itself - The variables that are to be included will typically depend on the purpose of the study - Usually include variables that may have various dynamic interactions or a perceived causal relationship --- # The companion form - Useful to express the `\(VAR(p)\)` as a `\(VAR(1)\)` in the companion form, `\begin{eqnarray} Z_{t}=\Gamma_{0}+\Gamma_{1}Z_{t-1}+\Upsilon_{t} \end{eqnarray}` - where we have, `\begin{eqnarray} Z_{t}=\left[ \begin{array}{c} \boldsymbol{y}_{t} \\ \boldsymbol{y}_{t-1} \\ \vdots \\ \boldsymbol{y}_{t-p+1} \end{array} \right] , \hspace{1cm} \Gamma_0=\left[ \begin{array}{c} \mu \\ 0 \\ \vdots \\ 0 \end{array} \right] , \hspace{1cm} \Upsilon_{t} =\left[ \begin{array}{c} \boldsymbol{u}_{t} \\ 0 \\ \vdots \\ 0 \end{array} \right] \end{eqnarray}` --- # The companion form - So that the matrix notation is `\begin{eqnarray} \left[ \begin{array}{c} \boldsymbol{y}_{t} \\ \boldsymbol{y}_{t-1} \\ \boldsymbol{y}_{t-2} \\ \vdots \\ \boldsymbol{y}_{t-p+1} \end{array} \right] =\left[ \begin{array}{c} \mu \\ 0 \\ 0 \\ \vdots \\ 0 \end{array} \right] +\left[ \begin{array}{ccccccc} A_{1} & A_{2} & \cdots & A_{p-1} & A_{p} \\ I & 0 & \cdots & 0 & 0 \\ 0 & I & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & I & 0 \end{array} \right] \left[ \begin{array}{c} \boldsymbol{y}_{t-1} \\ \boldsymbol{y}_{t-2} \\ \boldsymbol{y}_{t-3} \\ \vdots \\ \boldsymbol{y}_{t-p} \end{array} \right] + \left[ \begin{array}{c} \boldsymbol{u}_{t} \\ 0 \\ 0 \\ \vdots \\ 0 \end{array} \right] \end{eqnarray}` - where the vectors `\(Z_{t}\)`, `\(\Gamma_{0}\)` and `\(\Upsilon_{t}\)` are `\(Kp\times 1\)` - `\(A_{j}\)` for `\(j=1,\ldots , p\)` is `\(K\times K\)`, and - `\(\Gamma_{1}\)` is `\(Kp\times Kp\)` - In this case `\(\Gamma_{1}\)` is called the companion-form matrix --- # Stability of VAR model - The VAR is covariance-stationary when the effect of the shocks, `\(\boldsymbol{u}_t\)`, dissipate - This occurs when the eigenvalues of the companion form matrix are all less than one in absolute value - The eigenvalues of the matrix `\(\Gamma_1\)` are represented by `\(\lambda\)` in the expression, `\begin{eqnarray} |\Gamma_{1}-\lambda I|=0 \end{eqnarray}` - To derive the eigenvalues in our bivariate `\(VAR(1)\)` example, `\begin{eqnarray} \det \left[ \left[ \begin{array}{cc} 0.5 & 0 \\ 1 & 0.2 \end{array} \right] -\lambda \left[ \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right] \right] &=&\det \left[ \left[ \begin{array}{cc} 0.5-\lambda & 0 \\ 1 & 0.2-\lambda \end{array} \right] \right] \\ (0.5-\lambda )(0.2-\lambda ) &=&0 \end{eqnarray}` - Hence, `\begin{eqnarray} \lambda_{1} &=& 0.5, \hspace{2ex} \lambda_{2}=0.2 \end{eqnarray}` --- # Stability of VAR model - Certain researchers consider the values of the *characteristic roots*, which may be defined as `\(z\)` in the expression `\begin{eqnarray} |I-\Gamma_{1}z| \end{eqnarray}` - where the interpretation is reversed, as a stable stochastic process has characteristic roots that lie outside the unit circle - The interested reader may wish to consult Hamilton (1994) --- # Simulating stable VAR processes - We can simulate the above bivariate `\(VAR(1)\)` with `\(y_{k,0}=0\)`, `\(\mu_{k}=1\)` for `\(k=[1,2]\)` and `\begin{eqnarray} \boldsymbol{u}_t \sim \mathcal{N}\left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] ,\left[ \begin{array}{cc} 1 & 0.2 \\ 0.2 & 1 \end{array} \right] \right) \end{eqnarray}` - Note that the processes fluctuate around a constant mean & their variability does not appear to change with time --- background-image: url(image/sim_var.svg) background-position: top background-size: 90% 90% class: clear, center, bottom Figure : Simulated VAR processes --- # Wold representation - Just as the stable `\(AR(p)\)` model has a MA representation, the stable `\(VAR(p)\)` has a VMA representation - termed the Wold decomposition - Theorem states that every covariance-stationary time series can be written as the sum of two uncorrelated processes: - deterministic component, `\(\kappa_{t}\)`, (which could be the mean) - infinite moving average representation of `\(\sum_{j=0}^{\infty }\theta_{j} \varepsilon_{t-j}\)` - Hence, `\begin{eqnarray} y_{t}=\sum_{j=0}^{\infty }\theta_{j} \varepsilon_{t-j}+\kappa_{t} \end{eqnarray}` - where we assume `\(\theta_{0}=1\)` - `\(\sum_{j=0}^{\infty }|\theta_{j}|<\infty\)` - `\(\varepsilon_{t}\)` is white noise --- # Wold representation - This would involve fitting an infinite number of parameters `\(( \theta_{1}, \theta_{2}, \theta_{3}, \ldots )\)` to the data - With a finite number of observations, this is not possible - One can approximate `\(\theta (L)\)` by using models that have a finite number of parameters - Since we can write a `\(VAR(p)\)` as a `\(VAR(1)\)` model using the companion form, consider the example, `\begin{eqnarray} \boldsymbol{y}_{t}=\mu +A_{1}\boldsymbol{y}_{t-1}+\boldsymbol{u}_{t} \end{eqnarray}` - Using the lag operator, `\begin{eqnarray} (I-A_{1}L)\boldsymbol{y}_{t}=\mu +\boldsymbol{u}_{t} \end{eqnarray}` --- # Wold representation - Using the expression, `\((I-A_{1}L)=A(L)\)` we can write, `\begin{eqnarray} A(L)\boldsymbol{y}_{t}=\mu +\boldsymbol{u}_{t} \end{eqnarray}` - Multiplying with `\(A(L)^{-1}\)` we get the VMA representation, `\begin{eqnarray} \boldsymbol{y}_{t} &=&A(L)^{-1}\mu +A(L)^{-1}\boldsymbol{u}_{t} \\ &=&B(L)\mu +B(L) \boldsymbol{u}_t \\ &=&\varphi+\sum_{j=0}^{\infty }B_{j}\boldsymbol{u}_{t-j} \end{eqnarray}` --- # Wold representation - Where we have used the geometric rule `\begin{eqnarray} A(L)^{-1}=(I-A_{1}L)^{-1}=\sum_{j=0}^{p}A_{1}^{j}L^{j}\equiv B(L)=\sum_{j=0}^{\infty }B_{j}L^{j} \end{eqnarray}` - with `\(B_{0}=I\)` and `\(\varphi=\left( \sum\limits_{j=0}^{\infty }B_{j}\right) \mu\)` --- # Finding the MA coefficients - The MA coefficients, `\(B_{j}\)`, are derived from the relationship `\(I=B(L)A(L)\)` - Since, `\(A(L)^{-1}A(L)=I\)` and `\(A(L)^{-1}= B(L)\)`. Therefore, `\begin{eqnarray} I&=&B(L)A(L) \\ I &=&(B_{0}+B_{1}L+B_{2}L^{2}+ \ldots )(I-A_{1}L-A_{2}L^{2}- \ldots - A_{p}L^{p}) \\ &=&[B_{0}+B_{1}L+B_{2}L^{2}+ \ldots ] \\ && -[B_{0}A_{1}L+B_{1}A_{1}L^{2}+B_{2}A_{1}L^{3}+ \ldots ] \\ && -[B_{0}A_{2}L^{2}+B_{1}A_{2}L^{3}+B_{2}A_{2}L^{4}+ \ldots] - \ldots \\ &=&B_{0}+(B_{1}-B_{0}A_{1})L+(B_{2}-B_{1}A_{1}-B_{0}A_{2})L^{2}+ \ldots \\ && +\left(B_{p}-\sum_{j=1}^{i}B_{p-j}A_{j}\right) L^{p}+ \ldots \end{eqnarray}` --- # Finding the MA coefficients - Solving for the relevant lags (noting that `\(A_{1}=0\)` for `\(j>p\)`), we get, `\begin{eqnarray} B_{0} &=&I \\ B_{1} &=&B_{0}A_{1} \\ B_{2} &=&B_{1}A_{1}+B_{0}A_{2} \\ \vdots & & \vdots \\ B_{i} &=&\sum_{j=1}^{i}B_{i-j}A_{j}\hspace{1cm}\text{for }i=1,2, \ldots \end{eqnarray}` - Hence, the `\(B_{j}\)` parameters can be computed recursively --- # Mean, variance & autocovariance - The first two moments of the VAR can be derived from the MA representation `\begin{eqnarray} \boldsymbol{y}_{t} &=& \varphi+\sum_{j=0}^{\infty }B_{j} \boldsymbol{u}_{t-j} \end{eqnarray}` - where `\(\varphi=\left( \sum\limits_{j=0}^{\infty }B_{j}\right) \mu=(I-A_{1})^{-1}\mu\)` - Since the error terms are assumed to be Gaussian white noise the expected mean value is, `\begin{eqnarray} \mathbb{E}[\boldsymbol{y}_{t}]=\varphi=(I-A_{1})^{-1}\mu \end{eqnarray}` - This mean may be termed the steady-state of the system --- # Mean, variance & autocovariance - The covariance and autocovariances, denoted `\(\Psi\)`, may then be derived with the aid of Yule-Walker equations, where we write the process in the mean-adjusted form `\begin{eqnarray} \boldsymbol{y}_{t}-\varphi = A_{1}(\boldsymbol{y}_{t-1}-\varphi)+\boldsymbol{u}_{t} \end{eqnarray}` - Postmultiplying by `\((\boldsymbol{y}_{t-s}-\varphi)^{\prime }\)` and taking expectation gives, `\begin{eqnarray} \mathbb{E} \left[ \left(\boldsymbol{y}_{t}-\varphi \right) \left( \boldsymbol{y}_{t-s}-\varphi \right)^\prime \right] &=& A_{1} \mathbb{E} \left[ \left(\boldsymbol{y}_{t-1}-\varphi \right) \left( \boldsymbol{y}_{t-s}-\varphi \right)^\prime \right] \\ &&+ \; \mathbb{E} \left[\boldsymbol{u}_{t} \left(\boldsymbol{y}_{t-s}-\varphi \right)^{\prime }\right] \end{eqnarray}` --- # Mean, variance & autocovariance - Thus for `\(s=0\)`, `\begin{eqnarray} \Psi_{s}=A_{1} \Psi_{-1} + \Sigma_{\boldsymbol{u}}= A_{1} \Psi_{1}^{\prime}+ \Sigma_{\boldsymbol{u}} \end{eqnarray}` - where after the second equality sign, we used the fact that `\(\Psi_{-1}=\Psi_{1}^{\prime }\)` - Hence, for `\(s>0\)`, we have `\begin{eqnarray} \Psi_{s}=A_{1}\Psi_{s-1} \end{eqnarray}` - when `\(A_{1}\)` and `\(\Sigma_{\boldsymbol{u}}\)` are known, we can compute the autocovariance for `\(s=\{0,\ldots, S\}\)` using the above two expressions --- # Mean, variance & autocovariance - Hence, for `\(s=1\)`, we have `\(\Psi_{1}=A_{1}\Psi_{0}\)` - Substituting into the expression for `\(\Psi_{0}\)` and noting that `\([A_{1}\Psi_{0}]^{\prime }=\Psi_{0}^{\prime }A_{1}^{\prime }\)` - Using the rules of matrix algebra, we get, `\begin{eqnarray} \Psi_{0}=A_{1}\Psi_{0} A_{1}^{\prime} + \Sigma_{\boldsymbol{u}} \end{eqnarray}` - Solve for `\(\Psi_{0}\)` with the Kronecker product and the `\(vec\)` operator to get [see Lutkepohl (2005)] `\begin{eqnarray} vec \Psi_{0}=(I-A_{1}\otimes A_{1})^{-1}vec\Sigma_{u} \end{eqnarray}` - Once `\(\Psi_{0}\)` has been derived we can derive the autocovariances for `\(s>0\)` with recursive substitution --- # Mean, variance & autocovariance - Then, to get the autocorrelation function, we need to normalize the autocovariances so that they have ones on the diagonal at `\(s=0\)` - Thus, we define the diagonal matrix `\(\vartheta\)`, whose diagonal elements are the square roots of the diagonal elements of `\(\Psi_{0}\)` - Then the autocorrelation function for the VAR is simply, `\begin{eqnarray} R_s=\vartheta^{-1}\Psi_{h} \vartheta^{-1} \end{eqnarray}` - Note that while we only considered the case of a `\(VAR(1)\)`, we could have expressed a `\(VAR(p)\)` as a `\(VAR(1)\)` using the companion form - In this case one would need to specify a selection matrix to extract the values of interest [see Lutkepohl (2005)] --- # Estimating VAR parameters - VAR system can be estimated equation-by-equation using OLS - Would be consistent, and with normality of errors, efficient - Assuming that we have a sample size of `\(T\)`, with `\(\{y_{1}, \ldots ,y_{T}\}\)`, for each of the `\(K\)` variables - Can be shown that the estimator has the same efficiency as the generalized LS (GLS) estimator - Following Lutkepohl (2005), we define `\(Y=[\boldsymbol{y}_{1}, \ldots, \boldsymbol{y}_{T}]\)`, `\(A=[A_{1}, \ldots , A_{p}]\)`, `\(U=[\boldsymbol{u}_{1}, \ldots, \boldsymbol{u}_{T}]\)` and `\(Z=[Z_{0}, \ldots, Z_{T-1}]\)`, where, `\begin{eqnarray} Z_{t-1}=\left[ \begin{array}{c} \boldsymbol{y}_{t-1} \\ \boldsymbol{y}_{t-2} \\ \vdots \\ \boldsymbol{y}_{t-p} \end{array} \right] \end{eqnarray}` --- # Estimating VAR parameters - The VAR model can then be written as, `\begin{eqnarray} Y=AZ+U \end{eqnarray}` - And the OLS estimator of `\(A\)` is, `\begin{eqnarray} \hat{A}=[\hat{A}_{1}, \ldots, \hat{A}_{p}]= YZ^{\prime }(ZZ^{\prime })^{-1} \end{eqnarray}` - This OLS estimator is consistent and asymptotically normally distributed, `\begin{eqnarray} \sqrt{T}vec(\hat{A}-A)\overset{d}{\longrightarrow }\mathcal{N}(0,\Gamma ^{-1}\otimes \Sigma_{e}) \end{eqnarray}` - where `\(\overset{d}{\longrightarrow }\)` implies convergence in distribution - `\(vec\)` denotes the column stacking operator and `\(ZZ^{\prime }/T\overset{d}{\longrightarrow }\Gamma\)` --- # Choice of variable and lags - When deciding on variables and lags, note that these models quickly become heavily parameterised, which could result in d.o.f. problems - For example, with 6 variables and 8 lags, each equation will contain `\((6\times 8)+1=49\)` coefficients (including a constant) - Hence, with limited observations the parameter estimates may be imprecise - Choice of variables usually determined by economic theory or *a prior* ideas - Lags can be determined by information criterion (BIC or AIC) --- # VAR forecasts - VAR models are popular tools for forecasting variables - VAR forecasts are derived as per the AR forecasts - The observed `\(h\)`-step value of the `\(\boldsymbol{y}_t\)` processes would be, `\begin{eqnarray} \boldsymbol{y}_{t+h}=A_{1}^{h} \boldsymbol{y}_{t}+\sum_{i=0}^{h-1}A_{1}^{i} \boldsymbol{u}_{t+h-i} \end{eqnarray}` - Adding a constant, results in the following: `\begin{eqnarray} \boldsymbol{y}_{t+h}=(I+A_{1}+ \ldots A_{1}^{h-1})\mu +A_{1}^{h}\boldsymbol{y}_{t}+\sum_{i=0}^{h-1}A_{1}^{i} \boldsymbol{u}_{t+h-i} \end{eqnarray}` --- # VAR forecasts - By employing the conditional expectation, we get the VAR point forecast, `\begin{eqnarray} \mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{t}\right] = (I+A_{1}+ \ldots +A_{1}^{h-1})\mu +A_{1}^{h}\boldsymbol{y}_{t} \end{eqnarray}` - under the assumption of stationarity, this converges to the unconditional mean of the process when `\(h\rightarrow \infty\)`, `\begin{eqnarray} \mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{t}\right]=\frac{\mu }{I-A_{1}} \hspace{1cm}\text{ when }h\rightarrow \infty \end{eqnarray}` - Again, these equations are just the multivariate extensions of the previous formulas used for the AR models --- # Mean Squared Forecasting Error - Where `\(\mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{t}\right]\)` is the predictor the VAR forecast error at horizon `\(h\)` is, `\begin{eqnarray} \boldsymbol{y}_{t+h} - \mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{t}\right] =\sum_{i=0}^{h-1}A_{1}^{i} \boldsymbol{u}_{t+h-i} \end{eqnarray}` - The expectation of this provides the expected forecast error - With the assumption that `\(\mathbb{E}[\boldsymbol{u}_t]=0\)` `\begin{eqnarray} \mathbb{E}\left[\boldsymbol{y}_{t+h} - \mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{t}\right] \right] = \mathbb{E}\left[\boldsymbol{y}_{t+h}\right] - \mathbb{E}\left[\mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{t}\right] \right]=0 \end{eqnarray}` - Thus the predictor `\(\mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{t}\right]\)` is unbiased and the MSFE is simply the forecast error variance --- # Mean Squared Forecasting Error - In the multivariate VAR setting the MSFE is, `\begin{eqnarray} \boldsymbol{\sigma}_{t+h}^f&=& \mathbb{E}\left[\left(\boldsymbol{y}_{t+h} - \mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{t}\right]\right)^2 \right] \\ &=& \mathbb{E} \left[ \left( \sum_{i=0}^{h-1}A_{1}^{i} \boldsymbol{u}_{t+h-i}\right) \left( \sum_{i=0}^{h-1}A_{1}^{i} \boldsymbol{u}_{t+h-i}\right) \right] \end{eqnarray}` - where we can move the `\(A_{1}\)` terms outside of the expectation and `\(\mathbb{E} (\boldsymbol{u}_{t+h-i},\boldsymbol{u}_{t+h-i})=\Sigma_{\boldsymbol{u}}\)` for all `\(h\)` - Hence, `\begin{eqnarray} \boldsymbol{\sigma}_{t+h}^f=\sum_{i=0}^{h-1} \left( A_{1}^{j}\Sigma_{\boldsymbol{u}}A_{1}^{j^{\prime}} \right) \end{eqnarray}` --- # Uncertainty - Assuming the errors of the VAR model are Gaussian, `\(\boldsymbol{u}_{t}\sim \mathsf{i.i.d.} \mathcal{N}(0,\Sigma_{\boldsymbol{u}})\)`, and independent across time - The forecast errors are normally distributed, `\begin{eqnarray} \frac{\boldsymbol{y}_{k,t+h} - \mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{k,t}\right]}{\boldsymbol{\sigma}_{k,t+h}^f}\sim \mathcal{N}(0,1) \end{eqnarray}` - where `\(\boldsymbol{y}_{k,t+h}\)` and `\(\mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{k,t}\right]\)` are the `\(k\)`'th elements of the actual and predictor values - and `\(\boldsymbol{\sigma}_{k,t+h}^f\)` is the square root of the variance for the `\(k\)`'th equation (i.e. the square root of the `\(k\)`'th element on the diagonal on the `\(\boldsymbol{\sigma}_{t+h}^f\)` matrix) --- # Uncertainty - Forecast intervals can then be generated around the VAR point forecasts using, `\begin{eqnarray} \big[ \mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{k,t}\right] - z_{\alpha /2} \; \boldsymbol{\sigma}_{k,t+h}^f\; , \; \mathbb{E}\left[\boldsymbol{y}_{t+h}|\boldsymbol{y}_{k,t}\right] + z_{\alpha /2} \; \boldsymbol{\sigma}_{k,t+h}^f \big] \end{eqnarray}` - which is equivalent to what was derived in the previous discussion on forecasts --- # Forecast failure in macroeconomics - Since Sims (1980), examples of VARs that are used to forecast key economic variables such as output, prices, and the interest rates have been numerous - However, some recent work suggests that VAR models may be prone to instabilities - To improve the accuracy of forecasts with a VAR researchers use intercept correction, time varying-parameters, differencing data (that may help with mean shifts), model averaging, endogenous structural-breaks, etc. - See Clement and Hendry (2011), Clark and McCracken (2008), Allen and Fildes (2005), and others for interesting discussions --- # Granger causality - The idea is that a cause must precede the effect - Hence, if variable `\(y_{2,t}\)` Granger-causes behaviour in variable `\(y_{1,t}\)`, then `\(y_{2,t}\)` should improve upon the predictions of `\(y_{1,t}\)` - In the previous example, `\begin{eqnarray} \left[ \begin{array}{c} y_{1,t} \\ y_{2,t} \end{array} \right] =\left[ \begin{array}{c} \mu_{1} \\ \mu_{2} \end{array} \right] +\left[ \begin{array}{cc} 0.5 & 0 \\ 1 & 0.2 \end{array} \right] \left[ \begin{array}{c} y_{1,t-1} \\ y_{2,t-1} \end{array} \right] +\left[ \begin{array}{c} u_{1,t} \\ u_{2,t} \end{array} \right] \end{eqnarray}` - Note, `\(y_{2,t}\)` doesnt influence future values of `\(y_{1,t}\)` - However, `\(y_{1,t}\)` does influence future values of `\(y_{2,t}\)` --- # Granger causality - The test for Granger causality is formally implemented by means of a joint hypothesis test - For each equation in the VAR we compute `\(K-1\)` restricted versions of the VAR model, which is compared with the unrestricted version - Considers whether all the lags of the `\(k\)`'th variable in the system are jointly significantly different from zero - This is simply a standard `\(F\)`-test, where the null hypothesis is no Granger causality --- # Granger causality - In a `\(VAR(2)\)` model, with `\(K=3\)`, the first equation is, `\begin{eqnarray} y_{1,t}=\mu_{1}+\alpha_{11}y_{1,t-1}+\alpha_{12}y_{2,t-1}+\alpha_{13}y_{3,t-1}+ \ldots \\ \alpha_{14}y_{1,t-2}+\alpha_{15}y_{2,t-2}+\alpha_{16}y_{3,t-2}+e_{1,t} \end{eqnarray}` - Test for no Granger causality from `\(y_{2,t}\)` to `\(y_{1,t}\)` would be an `\(F\)`-test with null, `\(\alpha_{12}=\alpha_{15}=0\)` - Test for no Granger causality from `\(y_{3,t}\)` to `\(y_{1,t}\)` would be an `\(F\)`-test with null, `\(\alpha_{13}=\alpha_{16}=0\)` - Conducting similar tests on other equations of the VAR would give a complete test for no Granger causality - Rejection of any of the null hypothesis indicates Granger causality - Does not say anything about true causality, only used to infer a predictive relationship --- # Summary - VAR is a multivariate version of the univariate AR - `\(VAR(p)\)` can be written as a `\(VAR(1)\)` model by writing it in the companion form - If all eigenvalues of the companion form matrix are less than 1 in absolute value, the VAR is stable - Stable `\(VAR(p)\)` model can be inverted and written as an infinite order vector moving average model - VAR can be estimated by OLS equation-by-equation. Under standard assumptions, the OLS estimator will be similar to the maximum likelihood estimator of the whole system --- # Summary - Forecasting with a stable VAR will converge towards the unconditional mean of the model, and the MSFE matrix of the forecast errors can be derived with relative ease - Density forecasts and forecast intervals can be constructed based in a similar way to the AR models - Granger causality tests involve testing whether or not lagged values of a given variable in the VAR system help predict one of the other endogenous variables in the system. Such a test can simply be conducted using a `\(F\)`-test, where the null hypothesis is no Granger causality