Prior Predictive Check for the Multi-level ERC Model

Part two of the code for Section 3: Heterogeneity in earnings response coefficients

Harm Schuett https://hschuett.github.io/ (Tilburg University)https://www.tilburguniversity.edu
2021-10-03
library(tidyverse)
library(cmdstanr)
library(posterior)
kable <- knitr::kable
source("00-utils.R")

1. Introduction and preliminaries

This markdown file contains all the code necessary to replicate the prior predictive check in Section 3: Heterogeneity in earnings response coefficients of the Paper What Can Bayesian Inference Do for Accounting Research?. All the code can also be found in the repo. It contains 00-utils.R which contains a few helper functions for graphs and tables.

Note: I used the newer cmdstanr package instead of the older rstan package because it likely is the future of the R based Stan ecosystem. I also really like its api, which is very close to the api of the pystan package. An additional advantage (I hope) is thus that most model fitting code should be more or less directly transferable to pystan for those that want to work in python. Installing cmdstanr used to be tricky at times because one needs a working c++ toolchain. But it is much smoother now. Please see the cmdstanr doc for installation instructions

2. Loading the data

Same as in the main part of the ERC example. This is just de-meaning the y- and x-variable.

ea_data <- arrow::read_parquet("../data/ea-event-returns.pqt") 
ea_data <- 
  ea_data |> 
  mutate(
    ret_dm = AbEvRet - mean(AbEvRet),
    earn_surp_dm = earn_surp - mean(earn_surp)
    )
head(ea_data)
# A tibble: 6 x 15
  ticker permno fpend_date ea_date    actual_eps median_fcast_eps
  <chr>   <dbl> <date>     <date>          <dbl>            <dbl>
1 001N    14504 2014-12-31 2015-02-09       0.04            0.03 
2 001N    14504 2015-03-31 2015-05-05       0.06           -0.045
3 001N    14504 2015-06-30 2015-08-05       0.02           -0.015
4 001N    14504 2015-09-30 2015-11-04      -0.02            0.04 
5 002T    14503 2014-06-30 2014-08-07       0.19            0.235
6 002T    14503 2014-09-30 2014-11-04       0.18            0.34 
# ... with 9 more variables: num_forecasts <int>,
#   two_days_bef_ea <date>, Price <dbl>, earn_surp <dbl>,
#   ea_match_date <date>, AbEvRet <dbl>, firm_id <int>, ret_dm <dbl>,
#   earn_surp_dm <dbl>

3. Prior predictive check

Before fitting the actual model, I want to check whether the chosen priors are sort of sensible. To this end I wrote a Stan model that does not actually fit the model but simulates data from the chosen priors. It might be easier to skip this for now, continue to the actual model fitting part and then come back to review the prior predictive checking afterwards.

cat(read_lines("../Stan/erc-wkinfo-priorpred.stan"), sep = "\n")
data{
  int<lower=1> N;                   // num obs
  int<lower=1> J;                   // num groups
  int<lower=1> K;                   // num coefficients
  int<lower=1, upper=J> GroupID[N]; // GroupID for obs, e.g. FirmID or Industry-YearID
  // vector[N] y;                      // Response
  matrix[N, K] x;                   // Predictors (incl. Intercept)
}
generated quantities {
  matrix[K, J] z;                  // standard normal sampler
  cholesky_factor_corr[K] L_Omega; // hypprior coefficient correlation
  vector<lower=0>[K] tau;          // hypprior coefficient scales
  vector[K] mu_b;                  // hypprior mean coefficients
  real<lower=0> sigma;             // error-term scale
  matrix[J, K] b;                  // coefficient vector
  array[N] real y_pred;

  for (k in 1:K) {
    for (j in 1:J){
      z[k, j] = normal_rng(0, 1);
    }
  }
  L_Omega = lkj_corr_cholesky_rng(K, 2);
  mu_b[1] = normal_rng(0, 0.1);
  mu_b[2] = normal_rng(0, 40);
  sigma = exponential_rng(1.0 / 0.08);   // exp: 0.08 (std (abnormal returns))
  tau[1] = exponential_rng(1.0 / 0.1);   // exp: 0.1
  tau[2] = exponential_rng(1.0 / 40);    // exp: 40
  b = (rep_matrix(mu_b, J) + diag_pre_multiply(tau,L_Omega) * z)';
  y_pred = normal_rng(rows_dot_product(b[GroupID] , x), sigma);
}
model_priorpred <- cmdstan_model("../Stan/erc-wkinfo-priorpred.stan")
prio_check <- model_priorpred$sample(
  data = list(
    N = nrow(ea_data),
    J = max(ea_data$firm_id),
    K = 2,
    GroupID = ea_data$firm_id,
    x = as.matrix(data.frame(int = 1, esurp = ea_data$earn_surp))
  ),
  iter_sampling = 1000,
  iter_warmup = 0,
  seed = 1234,
  refresh = 1000,
  fixed_param = TRUE  
)
Running MCMC with 1 chain...

Chain 1 Iteration:   1 / 1000 [  0%]  (Sampling) 
Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
Chain 1 finished in 94.5 seconds.

The next piece collects the draws for key parameters and computes the descriptive stats for Panel B of Table 1

mu_a1_prip <- prio_check$draws(variables= c("mu_b[2]"), format = "matrix")
sig_a1_prip <- prio_check$draws(variables= c("tau[2]"), format = "matrix")
a_prip <- prio_check$draws(variables= c("b"), format = "matrix")
a1_prip <- as.vector(a_prip[, grepl(",2\\]", colnames(a_prip))])
y_prip <- 
  prio_check$draws(variables= c("y_pred"), format = "list") %>% 
  unlist()
tab1.B <- rbind(
  desc_row(mu_a1_prip, "mu_a1_prip"),
  desc_row(sig_a1_prip, "sig_a1_prip"),
  desc_row(a1_prip, "a_1_prip"),
  desc_row((y_prip + mean(ea_data$AbEvRet)), "Ret_prip")
  )

write_csv(tab1.B, "../out/results/tab1-panB.csv")
kable(tab1.B)
var mean sd q5 q25 q50 q75 q95
5% mu_a1_prip 1.0676335 39.283343 -63.1713500 -23.7874250 1.1804850 28.5151750 62.1924700
5%1 sig_a1_prip 38.4404197 38.405202 1.9138375 10.8653250 27.3052500 52.8953250 117.8279000
5%2 a_1_prip 1.0642062 67.046823 -101.4470000 -35.1546250 1.8402750 37.2140000 100.4770000
5%3 Ret_prip 0.0011037 0.363066 -0.4706719 -0.1396009 0.0017968 0.1422191 0.4687342

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hschuett/BayesForAccountingResearch, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".