A Primer on Bayesian Inference for Accounting Research: Prior Predictive Check for the Multi-level ERC Model

library(tidyverse)
library(cmdstanr)
library(posterior)
kable <- knitr::kable
source("00-utils.R")

1. Introduction and preliminaries

This markdown file contains all the code necessary to replicate the prior predictive check in Section 3: Heterogeneity in earnings response coefficients of the Paper What Can Bayesian Inference Do for Accounting Research?. All the code can also be found in the repo. It contains 00-utils.R which contains a few helper functions for graphs and tables.

Note: I used the newer cmdstanr package instead of the older rstan package because it likely is the future of the R based Stan ecosystem. I also really like its api, which is very close to the api of the pystan package. An additional advantage (I hope) is thus that most model fitting code should be more or less directly transferable to pystan for those that want to work in python. Installing cmdstanr used to be tricky at times because one needs a working c++ toolchain. But it is much smoother now. Please see the cmdstanr doc for installation instructions

2. Loading the data

Same as in the main part of the ERC example. This is just de-meaning the y- and x-variable.

ea_data <- arrow::read_parquet("../data/ea-event-returns.pqt") 
ea_data <- 
  ea_data |> 
  mutate(
    ret_dm = AbEvRet - mean(AbEvRet),
    earn_surp_dm = earn_surp - mean(earn_surp)
    )
head(ea_data)

# A tibble: 6 x 15
  ticker permno fpend_date ea_date    actual_eps median_fcast_eps
  <chr>   <dbl> <date>     <date>          <dbl>            <dbl>
1 001N    14504 2014-12-31 2015-02-09       0.04            0.03 
2 001N    14504 2015-03-31 2015-05-05       0.06           -0.045
3 001N    14504 2015-06-30 2015-08-05       0.02           -0.015
4 001N    14504 2015-09-30 2015-11-04      -0.02            0.04 
5 002T    14503 2014-06-30 2014-08-07       0.19            0.235
6 002T    14503 2014-09-30 2014-11-04       0.18            0.34 
# ... with 9 more variables: num_forecasts <int>,
#   two_days_bef_ea <date>, Price <dbl>, earn_surp <dbl>,
#   ea_match_date <date>, AbEvRet <dbl>, firm_id <int>, ret_dm <dbl>,
#   earn_surp_dm <dbl>

3. Prior predictive check

Before fitting the actual model, I want to check whether the chosen priors are sort of sensible. To this end I wrote a Stan model that does not actually fit the model but simulates data from the chosen priors. It might be easier to skip this for now, continue to the actual model fitting part and then come back to review the prior predictive checking afterwards.

cat(read_lines("../Stan/erc-wkinfo-priorpred.stan"), sep = "\n")

data{
  int<lower=1> N;                   // num obs
  int<lower=1> J;                   // num groups
  int<lower=1> K;                   // num coefficients
  int<lower=1, upper=J> GroupID[N]; // GroupID for obs, e.g. FirmID or Industry-YearID
  // vector[N] y;                      // Response
  matrix[N, K] x;                   // Predictors (incl. Intercept)
}
generated quantities {
  matrix[K, J] z;                  // standard normal sampler
  cholesky_factor_corr[K] L_Omega; // hypprior coefficient correlation
  vector<lower=0>[K] tau;          // hypprior coefficient scales
  vector[K] mu_b;                  // hypprior mean coefficients
  real<lower=0> sigma;             // error-term scale
  matrix[J, K] b;                  // coefficient vector
  array[N] real y_pred;

  for (k in 1:K) {
    for (j in 1:J){
      z[k, j] = normal_rng(0, 1);
    }
  }
  L_Omega = lkj_corr_cholesky_rng(K, 2);
  mu_b[1] = normal_rng(0, 0.1);
  mu_b[2] = normal_rng(0, 40);
  sigma = exponential_rng(1.0 / 0.08);   // exp: 0.08 (std (abnormal returns))
  tau[1] = exponential_rng(1.0 / 0.1);   // exp: 0.1
  tau[2] = exponential_rng(1.0 / 40);    // exp: 40
  b = (rep_matrix(mu_b, J) + diag_pre_multiply(tau,L_Omega) * z)';
  y_pred = normal_rng(rows_dot_product(b[GroupID] , x), sigma);
}

model_priorpred <- cmdstan_model("../Stan/erc-wkinfo-priorpred.stan")
prio_check <- model_priorpred$sample(
  data = list(
    N = nrow(ea_data),
    J = max(ea_data$firm_id),
    K = 2,
    GroupID = ea_data$firm_id,
    x = as.matrix(data.frame(int = 1, esurp = ea_data$earn_surp))
  ),
  iter_sampling = 1000,
  iter_warmup = 0,
  seed = 1234,
  refresh = 1000,
  fixed_param = TRUE  
)

Running MCMC with 1 chain...

Chain 1 Iteration:   1 / 1000 [  0%]  (Sampling) 
Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
Chain 1 finished in 94.5 seconds.

The next piece collects the draws for key parameters and computes the descriptive stats for Panel B of Table 1

mu_a1_prip <- prio_check$draws(variables= c("mu_b[2]"), format = "matrix")
sig_a1_prip <- prio_check$draws(variables= c("tau[2]"), format = "matrix")
a_prip <- prio_check$draws(variables= c("b"), format = "matrix")
a1_prip <- as.vector(a_prip[, grepl(",2\\]", colnames(a_prip))])
y_prip <- 
  prio_check$draws(variables= c("y_pred"), format = "list") %>% 
  unlist()

tab1.B <- rbind(
  desc_row(mu_a1_prip, "mu_a1_prip"),
  desc_row(sig_a1_prip, "sig_a1_prip"),
  desc_row(a1_prip, "a_1_prip"),
  desc_row((y_prip + mean(ea_data$AbEvRet)), "Ret_prip")
  )

write_csv(tab1.B, "../out/results/tab1-panB.csv")

kable(tab1.B)

	var	mean	sd	q5	q25	q50	q75	q95
5%	mu_a1_prip	1.0676335	39.283343	-63.1713500	-23.7874250	1.1804850	28.5151750	62.1924700
5%1	sig_a1_prip	38.4404197	38.405202	1.9138375	10.8653250	27.3052500	52.8953250	117.8279000
5%2	a_1_prip	1.0642062	67.046823	-101.4470000	-35.1546250	1.8402750	37.2140000	100.4770000
5%3	Ret_prip	0.0011037	0.363066	-0.4706719	-0.1396009	0.0017968	0.1422191	0.4687342

Prior Predictive Check for the Multi-level ERC Model

1. Introduction and preliminaries

2. Loading the data

3. Prior predictive check

Corrections

Reuse