Part two of the code for Section 3: Heterogeneity in earnings response coefficients
This markdown file contains all the code necessary to replicate the prior predictive check in Section 3: Heterogeneity in earnings response coefficients of the Paper What Can Bayesian Inference Do for Accounting Research?. All the code can also be found in the repo. It contains 00-utils.R
which contains a few helper functions for graphs and tables.
Note: I used the newer cmdstanr package instead of the older rstan package because it likely is the future of the R based Stan ecosystem. I also really like its api, which is very close to the api of the pystan package. An additional advantage (I hope) is thus that most model fitting code should be more or less directly transferable to pystan for those that want to work in python. Installing cmdstanr used to be tricky at times because one needs a working c++ toolchain. But it is much smoother now. Please see the cmdstanr doc for installation instructions
Same as in the main part of the ERC example. This is just de-meaning the y- and x-variable.
ea_data <- arrow::read_parquet("../data/ea-event-returns.pqt")
ea_data <-
ea_data |>
mutate(
ret_dm = AbEvRet - mean(AbEvRet),
earn_surp_dm = earn_surp - mean(earn_surp)
)
head(ea_data)
# A tibble: 6 x 15
ticker permno fpend_date ea_date actual_eps median_fcast_eps
<chr> <dbl> <date> <date> <dbl> <dbl>
1 001N 14504 2014-12-31 2015-02-09 0.04 0.03
2 001N 14504 2015-03-31 2015-05-05 0.06 -0.045
3 001N 14504 2015-06-30 2015-08-05 0.02 -0.015
4 001N 14504 2015-09-30 2015-11-04 -0.02 0.04
5 002T 14503 2014-06-30 2014-08-07 0.19 0.235
6 002T 14503 2014-09-30 2014-11-04 0.18 0.34
# ... with 9 more variables: num_forecasts <int>,
# two_days_bef_ea <date>, Price <dbl>, earn_surp <dbl>,
# ea_match_date <date>, AbEvRet <dbl>, firm_id <int>, ret_dm <dbl>,
# earn_surp_dm <dbl>
Before fitting the actual model, I want to check whether the chosen priors are sort of sensible. To this end I wrote a Stan model that does not actually fit the model but simulates data from the chosen priors. It might be easier to skip this for now, continue to the actual model fitting part and then come back to review the prior predictive checking afterwards.
cat(read_lines("../Stan/erc-wkinfo-priorpred.stan"), sep = "\n")
data{
int<lower=1> N; // num obs
int<lower=1> J; // num groups
int<lower=1> K; // num coefficients
int<lower=1, upper=J> GroupID[N]; // GroupID for obs, e.g. FirmID or Industry-YearID
// vector[N] y; // Response
matrix[N, K] x; // Predictors (incl. Intercept)
}
generated quantities {
matrix[K, J] z; // standard normal sampler
cholesky_factor_corr[K] L_Omega; // hypprior coefficient correlation
vector<lower=0>[K] tau; // hypprior coefficient scales
vector[K] mu_b; // hypprior mean coefficients
real<lower=0> sigma; // error-term scale
matrix[J, K] b; // coefficient vector
array[N] real y_pred;
for (k in 1:K) {
for (j in 1:J){
z[k, j] = normal_rng(0, 1);
}
}
L_Omega = lkj_corr_cholesky_rng(K, 2);
mu_b[1] = normal_rng(0, 0.1);
mu_b[2] = normal_rng(0, 40);
sigma = exponential_rng(1.0 / 0.08); // exp: 0.08 (std (abnormal returns))
tau[1] = exponential_rng(1.0 / 0.1); // exp: 0.1
tau[2] = exponential_rng(1.0 / 40); // exp: 40
b = (rep_matrix(mu_b, J) + diag_pre_multiply(tau,L_Omega) * z)';
y_pred = normal_rng(rows_dot_product(b[GroupID] , x), sigma);
}
model_priorpred <- cmdstan_model("../Stan/erc-wkinfo-priorpred.stan")
prio_check <- model_priorpred$sample(
data = list(
N = nrow(ea_data),
J = max(ea_data$firm_id),
K = 2,
GroupID = ea_data$firm_id,
x = as.matrix(data.frame(int = 1, esurp = ea_data$earn_surp))
),
iter_sampling = 1000,
iter_warmup = 0,
seed = 1234,
refresh = 1000,
fixed_param = TRUE
)
Running MCMC with 1 chain...
Chain 1 Iteration: 1 / 1000 [ 0%] (Sampling)
Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1 finished in 94.5 seconds.
The next piece collects the draws for key parameters and computes the descriptive stats for Panel B of Table 1
mu_a1_prip <- prio_check$draws(variables= c("mu_b[2]"), format = "matrix")
sig_a1_prip <- prio_check$draws(variables= c("tau[2]"), format = "matrix")
a_prip <- prio_check$draws(variables= c("b"), format = "matrix")
a1_prip <- as.vector(a_prip[, grepl(",2\\]", colnames(a_prip))])
y_prip <-
prio_check$draws(variables= c("y_pred"), format = "list") %>%
unlist()
kable(tab1.B)
var | mean | sd | q5 | q25 | q50 | q75 | q95 | |
---|---|---|---|---|---|---|---|---|
5% | mu_a1_prip | 1.0676335 | 39.283343 | -63.1713500 | -23.7874250 | 1.1804850 | 28.5151750 | 62.1924700 |
5%1 | sig_a1_prip | 38.4404197 | 38.405202 | 1.9138375 | 10.8653250 | 27.3052500 | 52.8953250 | 117.8279000 |
5%2 | a_1_prip | 1.0642062 | 67.046823 | -101.4470000 | -35.1546250 | 1.8402750 | 37.2140000 | 100.4770000 |
5%3 | Ret_prip | 0.0011037 | 0.363066 | -0.4706719 | -0.1396009 | 0.0017968 | 0.1422191 | 0.4687342 |
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hschuett/BayesForAccountingResearch, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".