Julian Faraway 2023-07-04
See the introduction for an overview.
This example is discussed in more detail in my book Extending the Linear Model with R
Required libraries:
library(faraway)
library(ggplot2)
library(lme4)
library(pbkrtest)
library(RLRsim)
library(INLA)
library(knitr)
library(cmdstanr)
register_knitr_engine(override = FALSE)
library(brms)
library(mgcv)
Multilevel models is a term used for models for data with hierarchical structure. The term is most commonly used in the social sciences. We can use the methodology we have already developed to fit some of these models.
We take as our example some data from the Junior School Project collected from primary (U.S. term is elementary) schools in inner London. We math test score result from year two as the response and try to model this as a function of gender, social class and the Raven’s test score from the first year which might be taken as a measure of ability when entering the school. We subset the data to ignore the math scores from the first two years, we centre the Raven score and create a combined class-by-school label:
data(jsp, package="faraway")
jspr <- jsp[jsp$year==2,]
jspr$craven <- jspr$raven-mean(jspr$raven)
jspr$classch <- paste(jspr$school,jspr$class,sep=".")
We can plot the data
ggplot(jspr, aes(x=raven, y=math))+xlab("Raven Score")+ylab("Math Score")+geom_point(position = position_jitter())
ggplot(jspr, aes(x=social, y=math))+xlab("Social Class")+ylab("Math Score")+geom_boxplot()
Although the data supports a more complex model, we simplify to having the centred Raven score and the social class as fixed effects and the school and class nested within school as random effects. See Extending the Linear Model with R,
mmod <- lmer(math ~ craven + social+(1|school)+(1|school:class),jspr)
faraway::sumary(mmod)
Fixed Effects:
coef.est coef.se
(Intercept) 32.01 1.03
craven 0.58 0.03
social2 -0.36 1.09
social3 -0.78 1.16
social4 -2.12 1.04
social5 -1.36 1.16
social6 -2.37 1.23
social7 -3.05 1.27
social8 -3.55 1.70
social9 -0.89 1.10
Random Effects:
Groups Name Std.Dev.
school:class (Intercept) 1.02
school (Intercept) 1.80
Residual 5.25
---
number of obs: 953, groups: school:class, 90; school, 48
AIC = 5949.7, DIC = 5933
deviance = 5928.3
We can see the math score is strongly related to the entering Raven score. We see that the math score tends to be lower as social class goes down. We also see the most substantial variation at the individual level with smaller amounts of variation at the school and class level.
We test the random effects:
mmodc <- lmer(math ~ craven + social+(1|school:class),jspr)
mmods <- lmer(math ~ craven + social+(1|school),jspr)
exactRLRT(mmodc, mmod, mmods)
simulated finite sample distribution of RLRT.
(p-value based on 10000 simulated values)
data:
RLRT = 1.85, p-value = 0.076
exactRLRT(mmods, mmod, mmodc)
simulated finite sample distribution of RLRT.
(p-value based on 10000 simulated values)
data:
RLRT = 7.64, p-value = 0.0026
The first test is for the class effect which just fails to meet the 5% significance level. The second test is for the school effect and shows strong evidence of differences between schools.
We can test the social fixed effect:
mmodm <- lmer(math ~ craven + (1|school)+(1|school:class),jspr)
KRmodcomp(mmod, mmodm)
large : math ~ craven + social + (1 | school) + (1 | school:class)
small : math ~ craven + (1 | school) + (1 | school:class)
stat ndf ddf F.scaling p.value
Ftest 2.76 8.00 930.34 1 0.0052
We see the social effect is significant.
We can compute confidence intervals for the parameters:
confint(mmod, method="boot")
2.5 % 97.5 %
.sig01 0.00000 1.676670
.sig02 0.97299 2.371398
.sigma 4.97880 5.522415
(Intercept) 30.15211 33.980577
craven 0.52266 0.645947
social2 -2.60963 1.681417
social3 -3.25265 1.293725
social4 -4.15992 -0.062255
social5 -3.47189 0.843701
social6 -4.74764 -0.071777
social7 -5.39161 -0.714736
social8 -7.22784 -0.407057
social9 -3.04726 1.211476
The lower end of the class confidence interval is zero while the school random effect is clearly larger. This is consistent with the earlier tests.
Integrated nested Laplace approximation is a method of Bayesian computation which uses approximation rather than simulation. More can be found on this topic in Bayesian Regression Modeling with INLA and the chapter on GLMMs
Use the most recent computational methodology:
inla.setOption(inla.mode="experimental")
inla.setOption("short.summary",TRUE)
formula <- math ~ social+craven + f(school, model="iid") + f(classch, model="iid")
result <- inla(formula, family="gaussian", data=jspr)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 31.791 1.016 29.798 31.791 33.784 31.791 0
social2 -0.189 1.096 -2.338 -0.190 1.960 -0.190 0
social3 -0.527 1.166 -2.813 -0.527 1.760 -0.527 0
social4 -1.841 1.037 -3.875 -1.841 0.195 -1.841 0
social5 -1.175 1.157 -3.443 -1.175 1.096 -1.175 0
social6 -2.215 1.231 -4.630 -2.215 0.199 -2.215 0
social7 -2.932 1.268 -5.419 -2.932 -0.445 -2.932 0
social8 -3.361 1.709 -6.712 -3.361 -0.009 -3.361 0
social9 -0.653 1.099 -2.808 -0.653 1.503 -0.653 0
craven 0.586 0.032 0.522 0.586 0.649 0.586 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 3.70e-02 4.00e-03 0.035 0.037 4.00e-02 0.036
Precision for school 6.10e+06 2.15e+08 8.598 482.238 3.69e+06 3.539
Precision for classch 2.02e-01 5.20e-02 0.097 0.201 2.83e-01 0.253
is computed
As usual, the default priors result in precisions for the random effects which are unbelievably large and we need to change the default prior.
Now try more informative gamma priors for the random effect precisions.
Define it so the mean value of gamma prior is set to the inverse of the
variance of the residuals of the fixed-effects only model. We expect the
error variances to be lower than this variance so this is an
overestimate. The variance of the gamma prior (for the precision) is
controlled by the apar
shape parameter.
apar <- 0.5
lmod <- lm(math ~ social+craven, jspr)
bpar <- apar*var(residuals(lmod))
lgprior <- list(prec = list(prior="loggamma", param = c(apar,bpar)))
formula = math ~ social+craven+f(school, model="iid", hyper = lgprior)+f(classch, model="iid", hyper = lgprior)
result <- inla(formula, family="gaussian", data=jspr)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 32.002 1.062 29.921 32.002 34.085 32.002 0
social2 -0.396 1.099 -2.552 -0.396 1.760 -0.396 0
social3 -0.761 1.171 -3.058 -0.761 1.537 -0.761 0
social4 -2.099 1.045 -4.149 -2.099 -0.049 -2.099 0
social5 -1.424 1.164 -3.707 -1.424 0.860 -1.424 0
social6 -2.350 1.239 -4.779 -2.350 0.080 -2.350 0
social7 -3.056 1.277 -5.561 -3.056 -0.551 -3.056 0
social8 -3.554 1.709 -6.905 -3.554 -0.202 -3.554 0
social9 -0.888 1.108 -3.060 -0.888 1.285 -0.888 0
craven 0.586 0.032 0.522 0.586 0.650 0.586 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 0.037 0.002 0.033 0.037 0.040 0.037
Precision for school 0.266 0.087 0.137 0.252 0.478 0.241
Precision for classch 0.342 0.101 0.186 0.328 0.580 0.319
is computed
Results are more credible.
Compute the transforms to an SD scale for the random effect terms. Make a table of summary statistics for the posteriors:
sigmasch <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmacla <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmasch,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmacla,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c("mu",result$names.fixed[2:10],"school SD","class SD","epsilon")
data.frame(restab) |> kable()
mu | social2 | social3 | social4 | social5 | social6 | social7 | social8 | social9 | craven | school.SD | class.SD | epsilon | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mean | 32.002 | -0.39634 | -0.76086 | -2.099 | -1.4237 | -2.3496 | -3.0562 | -3.5537 | -0.88791 | 0.58599 | 2.0144 | 1.7639 | 5.2212 |
sd | 1.0611 | 1.0985 | 1.1706 | 1.0448 | 1.1635 | 1.238 | 1.2767 | 1.7079 | 1.1072 | 0.032406 | 0.31699 | 0.25456 | 0.12383 |
quant0.025 | 29.92 | -2.5524 | -3.0584 | -4.1495 | -3.7073 | -4.7795 | -5.562 | -6.9058 | -3.061 | 0.52238 | 1.451 | 1.3155 | 4.9814 |
quant0.25 | 31.284 | -1.1394 | -1.5527 | -2.8056 | -2.2107 | -3.187 | -3.9197 | -4.7089 | -1.6368 | 0.56407 | 1.7896 | 1.5835 | 5.1363 |
quant0.5 | 32 | -0.39875 | -0.7634 | -2.1012 | -1.4263 | -2.3523 | -3.0589 | -3.5574 | -0.8903 | 0.58593 | 1.9945 | 1.7457 | 5.2197 |
quant0.75 | 32.715 | 0.34202 | 0.025949 | -1.3967 | -0.64168 | -1.5175 | -2.198 | -2.4057 | -0.1437 | 0.60778 | 2.2156 | 1.9241 | 5.3042 |
quant0.975 | 34.081 | 1.7557 | 1.5323 | -0.052322 | 0.85567 | 0.07534 | -0.55533 | -0.20801 | 1.2811 | 0.64947 | 2.6929 | 2.3136 | 5.4678 |
Also construct a plot the SD posteriors:
ddf <- data.frame(rbind(sigmasch,sigmacla,sigmaepsilon),errterm=gl(3,nrow(sigmasch),labels = c("school","class","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("math")+ylab("density")
Posteriors look OK although no weight given to smaller values.
In Simpson et al (2015), penalized complexity priors are proposed. This requires that we specify a scaling for the SDs of the random effects. We use the SD of the residuals of the fixed effects only model (what might be called the base model in the paper) to provide this scaling.
lmod <- lm(math ~ craven + social, jspr)
sdres <- sd(residuals(lmod))
pcprior <- list(prec = list(prior="pc.prec", param = c(3*sdres,0.01)))
formula = math ~ social+craven+f(school, model="iid", hyper = pcprior)+f(classch, model="iid", hyper = pcprior)
result <- inla(formula, family="gaussian", data=jspr)
summary(result)
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 31.921 1.027 29.907 31.921 33.935 31.921 0
social2 -0.303 1.093 -2.447 -0.303 1.843 -0.303 0
social3 -0.669 1.164 -2.952 -0.670 1.614 -0.670 0
social4 -2.003 1.037 -4.037 -2.003 0.032 -2.003 0
social5 -1.300 1.156 -3.567 -1.300 0.969 -1.300 0
social6 -2.306 1.230 -4.719 -2.306 0.107 -2.306 0
social7 -3.011 1.267 -5.496 -3.011 -0.524 -3.011 0
social8 -3.469 1.702 -6.807 -3.469 -0.130 -3.469 0
social9 -0.791 1.100 -2.947 -0.791 1.366 -0.791 0
craven 0.585 0.032 0.522 0.585 0.648 0.585 0
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 0.037 0.002 0.033 0.036 0.040 0.036
Precision for school 7.524 17.002 0.084 2.670 44.635 0.557
Precision for classch 0.279 0.097 0.150 0.260 0.525 0.233
is computed
Compute the summaries as before:
sigmasch <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmacla <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmasch,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmacla,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c("mu",result$names.fixed[2:10],"school SD","class SD","epsilon")
data.frame(restab) |> kable()
mu | social2 | social3 | social4 | social5 | social6 | social7 | social8 | social9 | craven | school.SD | class.SD | epsilon | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mean | 31.921 | -0.30258 | -0.66933 | -2.0033 | -1.2996 | -2.3064 | -3.0106 | -3.4689 | -0.79101 | 0.58504 | 0.88704 | 1.9696 | 5.2364 |
sd | 1.0262 | 1.0928 | 1.1632 | 1.0368 | 1.1556 | 1.2296 | 1.2667 | 1.7012 | 1.0991 | 0.032166 | 0.91583 | 0.30665 | 0.1252 |
quant0.025 | 29.907 | -2.4472 | -2.9522 | -4.0381 | -3.5673 | -4.7199 | -5.4969 | -6.8078 | -2.948 | 0.5219 | 0.1483 | 1.385 | 4.9923 |
quant0.25 | 31.227 | -1.0417 | -1.4561 | -2.7046 | -2.0813 | -3.138 | -3.8673 | -4.6195 | -1.5344 | 0.56329 | 0.35749 | 1.7537 | 5.1509 |
quant0.5 | 31.919 | -0.30507 | -0.67194 | -2.0056 | -1.3023 | -2.3091 | -3.0133 | -3.4726 | -0.79347 | 0.58498 | 0.59708 | 1.9684 | 5.2357 |
quant0.75 | 32.611 | 0.43179 | 0.11239 | -1.3065 | -0.52306 | -1.48 | -2.1592 | -2.3256 | -0.05239 | 0.60666 | 1.0595 | 2.1784 | 5.3207 |
quant0.975 | 33.931 | 1.8386 | 1.6097 | 0.028168 | 0.9647 | 0.10229 | -0.52906 | -0.13597 | 1.3624 | 0.64805 | 3.3753 | 2.5773 | 5.484 |
Make the plots:
ddf <- data.frame(rbind(sigmasch,sigmacla,sigmaepsilon),errterm=gl(3,nrow(sigmasch),labels = c("school","class","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("math")+ylab("density")
Posteriors put more weight on lower values compared to gamma prior.
STAN performs Bayesian inference using MCMC. I
use cmdstanr
to access Stan from R.
You see below the Stan code to fit our model. Rmarkdown allows the use of Stan chunks (elsewhere I have R chunks). The chunk header looks like this.
STAN chunk will be compiled to ‘mod’. Chunk header is:
cmdstan, output.var="mod", override = FALSE
data {
int<lower=0> Nobs;
int<lower=0> Npreds;
int<lower=0> Nlev1;
int<lower=0> Nlev2;
array[Nobs] real y;
matrix[Nobs,Npreds] x;
array[Nobs] int<lower=1,upper=Nlev1> levind1;
array[Nobs] int<lower=1,upper=Nlev2> levind2;
real<lower=0> sdscal;
}
parameters {
vector[Npreds] beta;
real<lower=0> sigmalev1;
real<lower=0> sigmalev2;
real<lower=0> sigmaeps;
vector[Nlev1] eta1;
vector[Nlev2] eta2;
}
transformed parameters {
vector[Nlev1] ran1;
vector[Nlev2] ran2;
vector[Nobs] yhat;
ran1 = sigmalev1 * eta1;
ran2 = sigmalev2 * eta2;
for (i in 1:Nobs)
yhat[i] = x[i]*beta+ran1[levind1[i]]+ran2[levind2[i]];
}
model {
eta1 ~ normal(0, 1);
eta2 ~ normal(0, 1);
sigmalev1 ~ cauchy(0, 2.5*sdscal);
sigmalev2 ~ cauchy(0, 2.5*sdscal);
sigmaeps ~ cauchy(0, 2.5*sdscal);
y ~ normal(yhat, sigmaeps);
}
We have used uninformative priors for the treatment effects but slightly informative half-cauchy priors for the three variances. All the fixed effects have been collected into a single design matrix. The school and class variables need to be renumbered into consecutive positive integers. Somewhat inconvenient since the schools are numbered up to 50 but have no data for two schools so only 48 schools are actually used.
lmod <- lm(math ~ craven + social, jspr)
sdscal <- sd(residuals(lmod))
Xmatrix <- model.matrix( ~ craven + social, jspr)
jspr$school <- factor(jspr$school)
jspr$classch <- factor(paste(jspr$school,jspr$class,sep="."))
jspdat <- list(Nobs=nrow(jspr),
Npreds=ncol(Xmatrix),
Nlev1=length(unique(jspr$school)),
Nlev2=length(unique(jspr$classch)),
y=jspr$math,
x=Xmatrix,
levind1=as.numeric(jspr$school),
levind2=as.numeric(jspr$classch),
sdscal=sdscal)
Do the MCMC sampling:
fit <- mod$sample(
data = jspdat,
seed = 123,
chains = 4,
parallel_chains = 4,
refresh = 500 # print update every 500 iters
)
Running MCMC with 4 parallel chains...
Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3 finished in 5.8 seconds.
Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2 finished in 6.1 seconds.
Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1 finished in 6.4 seconds.
Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4 finished in 6.5 seconds.
All 4 chains finished successfully.
Mean chain execution time: 6.2 seconds.
Total execution time: 6.7 seconds.
Extract the draws into a convenient dataframe format:
draws_df <- fit$draws(format = "df")
For the School SD:
ggplot(draws_df,
aes(x=.iteration,y=sigmalev1,color=factor(.chain))) + geom_line() +
labs(color = 'Chain', x="Iteration")
For the class SD
ggplot(draws_df,
aes(x=.iteration,y=sigmalev2,color=factor(.chain))) + geom_line() +
labs(color = 'Chain', x="Iteration")
All these are satisfactory.
Display the parameters of interest:
fit$summary(c("beta","sigmalev1","sigmalev2","sigmaeps"))
# A tibble: 13 × 10
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1 beta[1] 32.0 32.0 1.05 1.06 30.3 33.7 1.00 831. 1334.
2 beta[2] 0.584 0.584 0.0332 0.0337 0.529 0.639 1.00 6034. 2922.
3 beta[3] -0.360 -0.357 1.10 1.10 -2.20 1.43 1.00 921. 1531.
4 beta[4] -0.756 -0.751 1.19 1.20 -2.75 1.20 1.00 1002. 1744.
5 beta[5] -2.12 -2.11 1.06 1.07 -3.85 -0.412 1.00 825. 1688.
6 beta[6] -1.38 -1.37 1.17 1.17 -3.31 0.518 1.00 986. 1842.
7 beta[7] -2.40 -2.37 1.24 1.25 -4.47 -0.323 1.00 1048. 2070.
8 beta[8] -3.05 -3.06 1.29 1.29 -5.11 -0.955 1.00 1101. 2230.
9 beta[9] -3.55 -3.54 1.69 1.69 -6.35 -0.811 1.00 1659. 2600.
10 beta[10] -0.899 -0.888 1.12 1.14 -2.76 0.926 1.00 954. 1466.
11 sigmalev1 1.79 1.81 0.409 0.379 1.06 2.40 1.00 616. 755.
12 sigmalev2 0.975 0.971 0.493 0.523 0.158 1.79 1.00 447. 713.
13 sigmaeps 5.27 5.26 0.127 0.128 5.06 5.47 1.00 5333. 3035.
Remember that the beta correspond to the following parameters:
colnames(Xmatrix)
[1] "(Intercept)" "craven" "social2" "social3" "social4" "social5" "social6" "social7"
[9] "social8" "social9"
The results are comparable to the REML fit. The effective sample sizes are sufficient.
We can use extract to get at various components of the STAN fit. First consider the SDs for random components:
sdf = stack(draws_df[,c("sigmaeps","sigmalev1","sigmalev2")])
colnames(sdf) = c("math","SD")
levels(sdf$SD) = c("individual","school","class")
ggplot(sdf, aes(x=math,color=SD)) + geom_density()
As usual the error SD distribution is a more concentrated. The school SD is more diffuse and smaller whereas the class SD is smaller still. Now the treatement effects, considering the social class parameters first:
sdf = stack(draws_df[,4:11])
colnames(sdf) = c("math","social")
levels(sdf$social) = 2:9
ggplot(sdf, aes(x=math,color=social)) + geom_density()
Now just the raven score parameter:
ggplot(draws_df, aes(x=.data[["beta[2]"]])) + geom_density() + xlab("Math per Raven")
Now for the schools:
sdf = stack(draws_df[,startsWith(colnames(draws_df),"ran1")])
colnames(sdf) = c("math","school")
levels(sdf$school) = 1:48
ggplot(sdf, aes(x=math,group=school)) + geom_density()
We can see the variation between schools. A league table might be used to rank the schools but the high overlap in these distributions show that such a ranking should not be interpreted too seriously.
BRMS stands for Bayesian
Regression Models with STAN. It provides a convenient wrapper to STAN
functionality. We specify the model as in lmer()
above. I have used
more than the standard number of iterations because this reduces some
problems and does not cost much computationally.
suppressMessages(bmod <- brm(math ~ craven + social+(1|school)+(1|school:class),data=jspr,iter=10000, cores=4, backend = "cmdstanr"))
We get some minor warnings. We can obtain some posterior densities and diagnostics with:
plot(bmod, variable = "^s", regex=TRUE)
We have chosen only the random effect hyperparameters since this is where problems will appear first. Looks OK. We can see some weight is given to values of the class effect SD close to zero.
We can look at the STAN code that brms
used with:
stancode(bmod)
// generated with brms 2.19.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
// data for group-level effects of ID 2
int<lower=1> N_2; // number of grouping levels
int<lower=1> M_2; // number of coefficients per level
array[N] int<lower=1> J_2; // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2 : K) {
means_X[i - 1] = mean(X[ : , i]);
Xc[ : , i - 1] = X[ : , i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
array[M_1] vector[N_1] z_1; // standardized group-level effects
vector<lower=0>[M_2] sd_2; // group-level standard deviations
array[M_2] vector[N_2] z_2; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
vector[N_2] r_2_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = sd_1[1] * z_1[1];
r_2_1 = sd_2[1] * z_2[1];
lprior += student_t_lpdf(Intercept | 3, 32, 5.9);
lprior += student_t_lpdf(sigma | 3, 0, 5.9)
- 1 * student_t_lccdf(0 | 3, 0, 5.9);
lprior += student_t_lpdf(sd_1 | 3, 0, 5.9)
- 1 * student_t_lccdf(0 | 3, 0, 5.9);
lprior += student_t_lpdf(sd_2 | 3, 0, 5.9)
- 1 * student_t_lccdf(0 | 3, 0, 5.9);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1 : N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
target += std_normal_lpdf(z_2[1]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
We see that brms
is using student t distributions with 3 degrees of
freedom for the priors. For the three error SDs, this will be truncated
at zero to form half-t distributions. You can get a more explicit
description of the priors with prior_summary(bmod)
. These are
qualitatively similar to the the PC prior used in the INLA fit.
We examine the fit:
summary(bmod)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: math ~ craven + social + (1 | school) + (1 | school:class)
Data: jspr (Number of observations: 953)
Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
total post-warmup draws = 20000
Group-Level Effects:
~school (Number of levels: 48)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.78 0.44 0.72 2.57 1.00 1601 911
~school:class (Number of levels: 90)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.01 0.50 0.09 2.02 1.00 1494 1523
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 32.01 1.05 29.93 34.05 1.00 4780 7898
craven 0.58 0.03 0.52 0.65 1.00 23115 14709
social2 -0.35 1.11 -2.55 1.83 1.00 5331 8891
social3 -0.77 1.18 -3.07 1.55 1.00 5468 9286
social4 -2.11 1.06 -4.17 -0.05 1.00 4549 7980
social5 -1.35 1.17 -3.68 0.92 1.00 5099 9617
social6 -2.36 1.25 -4.82 0.07 1.00 5737 9807
social7 -3.04 1.29 -5.59 -0.52 1.00 6085 9948
social8 -3.53 1.72 -6.90 -0.13 1.00 8258 10286
social9 -0.88 1.13 -3.08 1.34 1.00 5000 8754
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 5.26 0.13 5.02 5.51 1.00 18923 12386
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The results are consistent with those seen previously.
It is possible to fit some GLMMs within the GAM framework of the mgcv
package. An explanation of this can be found in this
blog
gmod = gam(math ~ craven + social+s(school,bs="re")+s(classch,bs="re"),data=jspr, method="REML")
and look at the summary output:
summary(gmod)
Family: gaussian
Link function: identity
Formula:
math ~ craven + social + s(school, bs = "re") + s(classch, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.0107 1.0350 30.93 <2e-16
craven 0.5841 0.0321 18.21 <2e-16
social2 -0.3611 1.0948 -0.33 0.742
social3 -0.7767 1.1649 -0.67 0.505
social4 -2.1196 1.0396 -2.04 0.042
social5 -1.3632 1.1585 -1.18 0.240
social6 -2.3703 1.2330 -1.92 0.055
social7 -3.0482 1.2703 -2.40 0.017
social8 -3.5473 1.7027 -2.08 0.038
social9 -0.8863 1.1031 -0.80 0.422
Approximate significance of smooth terms:
edf Ref.df F p-value
s(school) 27.4 47 2.67 <2e-16
s(classch) 15.6 89 0.33 0.052
R-sq.(adj) = 0.378 Deviance explained = 41.2%
-REML = 2961.8 Scale est. = 27.572 n = 953
We get the fixed effect estimates. We also get tests on the random effects (as described in this article. The hypothesis of no variation is rejected for the school but not for the class. This is consistent with earlier findings.
We can get an estimate of the operator and error SD:
gam.vcomp(gmod)
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(school) 1.7967 1.21058 2.6667
s(classch) 1.0162 0.41519 2.4875
scale 5.2509 5.00863 5.5049
Rank: 3/3
The point estimates are the same as the REML estimates from lmer
earlier. The confidence intervals are different. A bootstrap method was
used for the lmer
fit whereas gam
is using an asymptotic
approximation resulting in substantially different results. Given the
problems of parameters on the boundary present in this example, the
bootstrap results appear more trustworthy.
The fixed and random effect estimates can be found with:
coef(gmod)
(Intercept) craven social2 social3 social4 social5 social6 social7
32.010740 0.584117 -0.361080 -0.776732 -2.119649 -1.363206 -2.370314 -3.048249
social8 social9 s(school).1 s(school).2 s(school).3 s(school).4 s(school).5 s(school).6
-3.547252 -0.886345 -2.256262 -0.422953 0.836164 -1.256567 -0.677051 0.186343
s(school).7 s(school).8 s(school).9 s(school).10 s(school).11 s(school).12 s(school).13 s(school).14
1.323942 0.364974 -2.022321 0.558287 -0.505067 0.016585 -0.615911 0.421654
s(school).15 s(school).16 s(school).17 s(school).18 s(school).19 s(school).20 s(school).21 s(school).22
-0.219725 0.441527 -0.204674 0.621186 -0.304768 -2.509540 -1.069436 -0.182572
s(school).23 s(school).24 s(school).25 s(school).26 s(school).27 s(school).28 s(school).29 s(school).30
2.242250 1.135253 1.155773 0.375632 -2.487656 -2.501408 1.101826 2.361060
s(school).31 s(school).32 s(school).33 s(school).34 s(school).35 s(school).36 s(school).37 s(school).38
0.056199 -1.044550 2.557490 -0.981792 2.559601 0.588989 2.437473 -1.000318
s(school).39 s(school).40 s(school).41 s(school).42 s(school).43 s(school).44 s(school).45 s(school).46
-2.022936 1.655924 -0.279566 -0.092512 -2.221997 0.216297 1.662840 -0.657181
s(school).47 s(school).48 s(classch).1 s(classch).2 s(classch).3 s(classch).4 s(classch).5 s(classch).6
0.066572 0.592921 0.573442 -1.295266 -0.101570 0.280178 0.328046 0.171502
s(classch).7 s(classch).8 s(classch).9 s(classch).10 s(classch).11 s(classch).12 s(classch).13 s(classch).14
-0.222321 -0.438808 -0.666694 0.672000 -0.197043 -0.085737 0.220633 -0.432686
s(classch).15 s(classch).16 s(classch).17 s(classch).18 s(classch).19 s(classch).20 s(classch).21 s(classch).22
0.362391 0.280296 -0.139043 -0.065479 0.452835 -0.254105 -0.135311 -0.539712
s(classch).23 s(classch).24 s(classch).25 s(classch).26 s(classch).27 s(classch).28 s(classch).29 s(classch).30
0.442210 -0.204360 -0.598493 -0.164090 -0.178044 0.181223 -0.239631 0.717342
s(classch).31 s(classch).32 s(classch).33 s(classch).34 s(classch).35 s(classch).36 s(classch).37 s(classch).38
0.363191 0.266493 0.103262 -0.045999 0.166171 -0.692729 -0.103123 -0.800251
s(classch).39 s(classch).40 s(classch).41 s(classch).42 s(classch).43 s(classch).44 s(classch).45 s(classch).46
0.583420 -0.315914 -0.057735 0.410232 0.374896 0.436353 -0.055897 0.017979
s(classch).47 s(classch).48 s(classch).49 s(classch).50 s(classch).51 s(classch).52 s(classch).53 s(classch).54
-0.397021 0.062848 0.224533 0.880602 -0.286941 -0.047087 -0.267009 0.603827
s(classch).55 s(classch).56 s(classch).57 s(classch).58 s(classch).59 s(classch).60 s(classch).61 s(classch).62
-0.232530 0.447571 0.188430 0.779797 -0.243148 -0.076874 -0.402001 -0.647178
s(classch).63 s(classch).64 s(classch).65 s(classch).66 s(classch).67 s(classch).68 s(classch).69 s(classch).70
0.529764 -1.156667 0.131640 0.935589 -0.029596 -0.710862 0.069198 0.487627
s(classch).71 s(classch).72 s(classch).73 s(classch).74 s(classch).75 s(classch).76 s(classch).77 s(classch).78
0.044350 0.037317 0.073938 -0.053456 -0.268045 0.122984 -0.101686 -0.252028
s(classch).79 s(classch).80 s(classch).81 s(classch).82 s(classch).83 s(classch).84 s(classch).85 s(classch).86
0.035425 0.189688 0.258015 -0.198400 0.373615 0.049941 0.311116 0.069280
s(classch).87 s(classch).88 s(classch).89 s(classch).90
-0.263633 0.582019 -0.587131 -0.641870
In Wood (2019), a simplified
version of INLA is proposed. The first construct the GAM model without
fitting and then use the ginla()
function to perform the computation.
gmod = gam(math ~ craven + social+s(school,bs="re")+s(classch,bs="re"),
data=jspr, fit = FALSE)
gimod = ginla(gmod)
We get the posterior density for the intercept as:
plot(gimod$beta[1,],gimod$density[1,],type="l",xlab="math",ylab="density")
We get the posterior density for the raven effect as:
plot(gimod$beta[2,],gimod$density[2,],type="l",xlab="math per raven",ylab="density")
and for the social effects as:
xmat = t(gimod$beta[3:10,])
ymat = t(gimod$density[3:10,])
matplot(xmat, ymat,type="l",xlab="math",ylab="density")
legend("left",paste0("social",2:9),col=1:8,lty=1:8)
We can see some overlap between the effects, but strong evidence of a negative outcome relative to social class 1 for some classes.
It is not straightforward to obtain the posterior densities of the hyperparameters.
See the Discussion of the single random effect model for general comments.
-
As with the previous analyses, sometimes the INLA posteriors for the hyperparameters have densities which do not give weight to close-to-zero values where other analyses suggest this might be reasonable.
-
There is relatively little disagreement between the methods and much similarity.
-
There were no major computational issue with the analyses (in contrast with some of the other examples)
-
The
mgcv
analyses took a little longer than previous analyses because the sample size is larger (but still were quite fast).
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] mgcv_1.8-42 nlme_3.1-162 brms_2.19.0 Rcpp_1.0.10 cmdstanr_0.5.3 knitr_1.43 INLA_23.05.30-1
[8] sp_2.0-0 foreach_1.5.2 RLRsim_3.1-8 pbkrtest_0.5.2 lme4_1.1-33 Matrix_1.5-4.1 ggplot2_3.4.2
[15] faraway_1.0.8
loaded via a namespace (and not attached):
[1] gridExtra_2.3 inline_0.3.19 rlang_1.1.1 magrittr_2.0.3 matrixStats_1.0.0
[6] compiler_4.3.1 loo_2.6.0 systemfonts_1.0.4 callr_3.7.3 vctrs_0.6.3
[11] reshape2_1.4.4 stringr_1.5.0 crayon_1.5.2 pkgconfig_2.0.3 fastmap_1.1.1
[16] backports_1.4.1 ellipsis_0.3.2 labeling_0.4.2 utf8_1.2.3 threejs_0.3.3
[21] promises_1.2.0.1 rmarkdown_2.22 markdown_1.7 ps_1.7.5 nloptr_2.0.3
[26] MatrixModels_0.5-1 purrr_1.0.1 xfun_0.39 jsonlite_1.8.5 later_1.3.1
[31] Deriv_4.1.3 prettyunits_1.1.1 broom_1.0.5 R6_2.5.1 dygraphs_1.1.1.6
[36] StanHeaders_2.26.27 stringi_1.7.12 boot_1.3-28.1 rstan_2.21.8 iterators_1.0.14
[41] zoo_1.8-12 base64enc_0.1-3 bayesplot_1.10.0 httpuv_1.6.11 splines_4.3.1
[46] igraph_1.5.0 tidyselect_1.2.0 rstudioapi_0.14 abind_1.4-5 yaml_2.3.7
[51] codetools_0.2-19 miniUI_0.1.1.1 processx_3.8.1 pkgbuild_1.4.1 lattice_0.21-8
[56] tibble_3.2.1 plyr_1.8.8 shiny_1.7.4 withr_2.5.0 bridgesampling_1.1-2
[61] posterior_1.4.1 coda_0.19-4 evaluate_0.21 RcppParallel_5.1.7 xts_0.13.1
[66] pillar_1.9.0 tensorA_0.36.2 stats4_4.3.1 checkmate_2.2.0 DT_0.28
[71] shinyjs_2.1.0 distributional_0.3.2 generics_0.1.3 rstantools_2.3.1 munsell_0.5.0
[76] scales_1.2.1 minqa_1.2.5 gtools_3.9.4 xtable_1.8-4 glue_1.6.2
[81] tools_4.3.1 shinystan_2.6.0 data.table_1.14.8 colourpicker_1.2.0 mvtnorm_1.2-2
[86] grid_4.3.1 tidyr_1.3.0 crosstalk_1.2.0 colorspace_2.1-0 cli_3.6.1
[91] fansi_1.0.4 svglite_2.1.1 Brobdingnag_1.2-9 dplyr_1.1.2 gtable_0.3.3
[96] digest_0.6.31 htmlwidgets_1.6.2 farver_2.1.1 htmltools_0.5.5 lifecycle_1.0.3
[101] mime_0.12 shinythemes_1.2.0 MASS_7.3-60