state_space_gdp: Using Bayesian State Space Models to Reconcile Historical GDP Estimates

The object of this exercise is to try to reconcile historical GDP estimates by optimally combining them using state space models.

Given a series of measures of GDP such as , , and , for the income, expenditure, and output approaches to measuring GDP, we might like to optimally combine them to get a series that reflects the measurement efforts of multiple scholars. Many attempts to combine GDP measurements exist, and here we follow the approach of Aruoba et. al. as they designed the state-space estimator for GDPPlus as implemented by the Philadelphia Federal Reserve.

We stack our GDP measures into a vector $Y_{t}$ measured over time periods to . We assume that our measurements of GDP are generated by true GDP, but are corrupted by noise:

$Y_t = [1, 1, 1]' x_t + \epsilon_t.$

True GDP itself is latent but moves autoregressively:

$x_t = \mu (1 - \rho) + \rho x_{t-1} + \eta_t.$

An important consideration is the covariance structure of our measurements of GDP. It is reasonable to think that they are likely to be correlated, thus we model the covariance of our system as block-diagonal in the signals, so that the 4 $\times$ 4 matrix $\Sigma$ has zeros in rows 2:4 of column 1, and in columns 2:4 of row 1.

$\Sigma = \begin{bmatrix} \sigma_{gg} & 0 & 0 & 0 \\ 0 & \sigma_{ii} & \sigma_{ie} & \sigma_{io} \\ 0 & \sigma_{ei} & \sigma_{ee} & \sigma_{eo} \\ 0 & \sigma_{oi} & \sigma_{oe} & \sigma_{oo} \end{bmatrix}$

We are unfortunately constrained initially to a block-diagonal formulation so that the system remains identifiable. Aruoba et. al. suggest two approaches to relaxing this assumption which I will try to incorporate later.

As a result of this formulation, we need to estimate 9 parameters: $\mu$ , $\rho$ , the 4 diagonal variance terms $sigma_{xx} \forall x \in {g,i,e,o}$ , and the off-diagonal variance terms in the lower or upper triangle of $\Sigma$ .

The model is implemented in Stan. I test the implementation against the GDPPlus model which provides published inputs and outputs. It computes GDPPlus using two methods of measuring GDP: Income and Production. I pick priors that match the discussion in the Aruoba et. al. paper.

First, we need to load Stan and some associated libraries, load the data, format it for our program and compile the stan model.

library(readxl)
library(tidyverse)
library(tidybayes)
library(cmdstanr)
library(bayesplot)

## helper function to format data
source("code/format_data.R")

## compile model
ss_block_diagonal <- cmdstan_model("code/ss_block_diagonal.stan", include_paths = ".")

df <- read_xlsx("data/gdpplus.xlsx")
df %>% dim()
#> [1] 240   7

y_mat <- df %>% filter(OBS_YEAR < 2012) %>% select(GRGDP_DATA, GRGDI_DATA)

data <- format_data(y_mat)

Now we are in a position to sample from the model. I draw 2000 samples with 1000 warm-up samples across 4 parallel chains. The model samples quickly and this should take under a minute.

## Fitting the model

fit_stan_gdp <- ss_block_diagonal$sample(data = data, 
                                         parallel_chains = 4, 
                                         iter_sampling = 2000, 
                                         show_messages = F, refresh = 1000)
#> Running MCMC with 4 parallel chains...
#> 
#> Chain 1 Iteration:    1 / 3000 [  0%]  (Warmup) 
#> Chain 2 Iteration:    1 / 3000 [  0%]  (Warmup) 
#> Chain 3 Iteration:    1 / 3000 [  0%]  (Warmup) 
#> Chain 4 Iteration:    1 / 3000 [  0%]  (Warmup) 
#> Chain 4 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
#> Chain 4 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
#> Chain 3 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
#> Chain 3 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
#> Chain 2 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
#> Chain 2 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
#> Chain 1 Iteration: 1000 / 3000 [ 33%]  (Warmup) 
#> Chain 1 Iteration: 1001 / 3000 [ 33%]  (Sampling) 
#> Chain 4 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
#> Chain 3 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
#> Chain 2 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
#> Chain 1 Iteration: 2000 / 3000 [ 66%]  (Sampling) 
#> Chain 4 Iteration: 3000 / 3000 [100%]  (Sampling) 
#> Chain 4 finished in 24.1 seconds.
#> Chain 3 Iteration: 3000 / 3000 [100%]  (Sampling) 
#> Chain 3 finished in 24.8 seconds.
#> Chain 2 Iteration: 3000 / 3000 [100%]  (Sampling) 
#> Chain 2 finished in 25.2 seconds.
#> Chain 1 Iteration: 3000 / 3000 [100%]  (Sampling) 
#> Chain 1 finished in 25.6 seconds.
#> 
#> All 4 chains finished successfully.
#> Mean chain execution time: 24.9 seconds.
#> Total execution time: 25.8 seconds.


# check against their point estimates
aruoba <- tibble(mean = c(3.06, .62, 5.17, 
                          3.86, 1.43, 2.7),
                 `25%` = c(2.77, .57, 4.39,
                           3.34, .96, 2.25),
                 `75%` = c(3.34, .68, 5.95,
                           4.48, 1.95, 3.22),
                 Study = "Aruoba et. al.",
                 params = c("mu", "rho", "sigma_gg", 
                            "sigma_ee", "sigma_ei", "sigma_ii"))

param_comp <- fit_stan_gdp$summary(variables = c("gamma", "theta", "sigma_state", "Sigma"),
                     mean, ~quantile(.x, c(.25, .75))) %>% 
  filter(variable != "Sigma[1,2]") %>% 
  mutate(Study = "Mesevage",
         params = c("mu", "rho", "sigma_gg", 
                    "sigma_ee", "sigma_ei", "sigma_ii")) %>% 
  select(-variable) %>% 
  bind_rows(aruoba)

param_comp[param_comp$Study == "Mesevage" & param_comp$params == "sigma_gg", 1:3] <-
  param_comp[param_comp$Study == "Mesevage" & param_comp$params == "sigma_gg", 1:3]^2 %>% 
  round(2)

# plot parameter comparisons

param_comp %>% 
  ggplot(aes(Study, mean, ymin = `25%`, ymax = `75%`, color = Study)) + 
  facet_wrap(~params, scales = "free") +
  geom_pointrange() +
  hrbrthemes::theme_ipsum() +
  scale_color_viridis_d(option = "plasma", end = .7) +
  ggtitle("Comparison of parameter estimates and inter-quartile ranges",
          subtitle = "Stan implementation vs Aruoba et. al.")

We also want to compare our overall estimate of GDPPlus against the Fed’s estimation:

# Plot the estimated state against the true

mcmc_recover_scatter(fit_stan_gdp$draws(variables = "xhat"), 
                     true = df %>% filter(OBS_YEAR < 2012) %>% .$GDPPLUS_DATA) +
  geom_abline(slope = 1)

Finally, we want to compare our predicted GDPPlus values against the true values with our associated uncertainty. You can see our Stan code is performing very closely to whatever system the Fed is using.

# extract median estimates
sum_xhat <- fit_stan_gdp$summary(variables = "xhat", 
                               ~quantile(.x, c(.025, .5, .975)))

sum_xhat$true_state <- df %>% filter(OBS_YEAR < 2012) %>% .$GDPPLUS_DATA
sum_xhat <- bind_cols(sum_xhat, y_mat)

sum_xhat %>% 
  ggplot(aes(x = 1:nrow(sum_xhat), y = `50%`)) + 
  geom_line(aes(color = "Estimated State")) + 
  geom_line(aes(y = true_state, color = "GDPPlus Fed")) + 
  geom_ribbon(aes(ymin = `2.5%`, ymax = `97.5%`), alpha=.4, fill = "grey") +
  hrbrthemes::theme_ipsum() + 
  labs(title = "Estimated State and 95% Credible Interval", 
       color = "Legend",
       x = "Time", y = "Posterior Median + 95% CI") +
  scale_color_manual(values = c("GDPPlus Fed" = "coral", 
                                "Estimated State" = "black")) +
  theme(legend.position = "bottom")

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
README_files/figure-gfm		README_files/figure-gfm
code		code
data		data
draft		draft
figures		figures
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

state_space_gdp: Using Bayesian State Space Models to Reconcile Historical GDP Estimates

About

Releases

Packages

Languages

gabrielfgm/state_space_gdp

Folders and files

Latest commit

History

Repository files navigation

state_space_gdp: Using Bayesian State Space Models to Reconcile Historical GDP Estimates

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages