Skip to content

Latest commit

 

History

History
1020 lines (823 loc) · 41.6 KB

jspmultilevel.md

File metadata and controls

1020 lines (823 loc) · 41.6 KB

Multilevel Design

Julian Faraway 2023-07-04

See the introduction for an overview.

This example is discussed in more detail in my book Extending the Linear Model with R

Required libraries:

library(faraway)
library(ggplot2)
library(lme4)
library(pbkrtest)
library(RLRsim)
library(INLA)
library(knitr)
library(cmdstanr)
register_knitr_engine(override = FALSE)
library(brms)
library(mgcv)

Data

Multilevel models is a term used for models for data with hierarchical structure. The term is most commonly used in the social sciences. We can use the methodology we have already developed to fit some of these models.

We take as our example some data from the Junior School Project collected from primary (U.S. term is elementary) schools in inner London. We math test score result from year two as the response and try to model this as a function of gender, social class and the Raven’s test score from the first year which might be taken as a measure of ability when entering the school. We subset the data to ignore the math scores from the first two years, we centre the Raven score and create a combined class-by-school label:

data(jsp, package="faraway")
jspr <- jsp[jsp$year==2,]
jspr$craven <- jspr$raven-mean(jspr$raven)
jspr$classch <- paste(jspr$school,jspr$class,sep=".")

We can plot the data

ggplot(jspr, aes(x=raven, y=math))+xlab("Raven Score")+ylab("Math Score")+geom_point(position = position_jitter())

ggplot(jspr, aes(x=social, y=math))+xlab("Social Class")+ylab("Math Score")+geom_boxplot()

Mixed Effect Model

Although the data supports a more complex model, we simplify to having the centred Raven score and the social class as fixed effects and the school and class nested within school as random effects. See Extending the Linear Model with R,

mmod <- lmer(math ~ craven + social+(1|school)+(1|school:class),jspr)
faraway::sumary(mmod)
Fixed Effects:
            coef.est coef.se
(Intercept) 32.01     1.03  
craven       0.58     0.03  
social2     -0.36     1.09  
social3     -0.78     1.16  
social4     -2.12     1.04  
social5     -1.36     1.16  
social6     -2.37     1.23  
social7     -3.05     1.27  
social8     -3.55     1.70  
social9     -0.89     1.10  

Random Effects:
 Groups       Name        Std.Dev.
 school:class (Intercept) 1.02    
 school       (Intercept) 1.80    
 Residual                 5.25    
---
number of obs: 953, groups: school:class, 90; school, 48
AIC = 5949.7, DIC = 5933
deviance = 5928.3 

We can see the math score is strongly related to the entering Raven score. We see that the math score tends to be lower as social class goes down. We also see the most substantial variation at the individual level with smaller amounts of variation at the school and class level.

We test the random effects:

mmodc <- lmer(math ~ craven + social+(1|school:class),jspr)
mmods <- lmer(math ~ craven + social+(1|school),jspr)
exactRLRT(mmodc, mmod, mmods)
    simulated finite sample distribution of RLRT.
    
    (p-value based on 10000 simulated values)

data:  
RLRT = 1.85, p-value = 0.076
exactRLRT(mmods, mmod, mmodc)
    simulated finite sample distribution of RLRT.
    
    (p-value based on 10000 simulated values)

data:  
RLRT = 7.64, p-value = 0.0026

The first test is for the class effect which just fails to meet the 5% significance level. The second test is for the school effect and shows strong evidence of differences between schools.

We can test the social fixed effect:

mmodm <- lmer(math ~ craven + (1|school)+(1|school:class),jspr)
KRmodcomp(mmod, mmodm)
large : math ~ craven + social + (1 | school) + (1 | school:class)
small : math ~ craven + (1 | school) + (1 | school:class)
        stat    ndf    ddf F.scaling p.value
Ftest   2.76   8.00 930.34         1  0.0052

We see the social effect is significant.

We can compute confidence intervals for the parameters:

confint(mmod, method="boot")
               2.5 %    97.5 %
.sig01       0.00000  1.676670
.sig02       0.97299  2.371398
.sigma       4.97880  5.522415
(Intercept) 30.15211 33.980577
craven       0.52266  0.645947
social2     -2.60963  1.681417
social3     -3.25265  1.293725
social4     -4.15992 -0.062255
social5     -3.47189  0.843701
social6     -4.74764 -0.071777
social7     -5.39161 -0.714736
social8     -7.22784 -0.407057
social9     -3.04726  1.211476

The lower end of the class confidence interval is zero while the school random effect is clearly larger. This is consistent with the earlier tests.

INLA

Integrated nested Laplace approximation is a method of Bayesian computation which uses approximation rather than simulation. More can be found on this topic in Bayesian Regression Modeling with INLA and the chapter on GLMMs

Use the most recent computational methodology:

inla.setOption(inla.mode="experimental")
inla.setOption("short.summary",TRUE)
formula <- math ~ social+craven + f(school, model="iid") + f(classch, model="iid")
result <- inla(formula, family="gaussian", data=jspr)
summary(result)
Fixed effects:
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 31.791 1.016     29.798   31.791     33.784 31.791   0
social2     -0.189 1.096     -2.338   -0.190      1.960 -0.190   0
social3     -0.527 1.166     -2.813   -0.527      1.760 -0.527   0
social4     -1.841 1.037     -3.875   -1.841      0.195 -1.841   0
social5     -1.175 1.157     -3.443   -1.175      1.096 -1.175   0
social6     -2.215 1.231     -4.630   -2.215      0.199 -2.215   0
social7     -2.932 1.268     -5.419   -2.932     -0.445 -2.932   0
social8     -3.361 1.709     -6.712   -3.361     -0.009 -3.361   0
social9     -0.653 1.099     -2.808   -0.653      1.503 -0.653   0
craven       0.586 0.032      0.522    0.586      0.649  0.586   0

Model hyperparameters:
                                            mean       sd 0.025quant 0.5quant 0.975quant  mode
Precision for the Gaussian observations 3.70e-02 4.00e-03      0.035    0.037   4.00e-02 0.036
Precision for school                    6.10e+06 2.15e+08      8.598  482.238   3.69e+06 3.539
Precision for classch                   2.02e-01 5.20e-02      0.097    0.201   2.83e-01 0.253

 is computed 

As usual, the default priors result in precisions for the random effects which are unbelievably large and we need to change the default prior.

Informative Gamma priors on the precisions

Now try more informative gamma priors for the random effect precisions. Define it so the mean value of gamma prior is set to the inverse of the variance of the residuals of the fixed-effects only model. We expect the error variances to be lower than this variance so this is an overestimate. The variance of the gamma prior (for the precision) is controlled by the apar shape parameter.

apar <- 0.5
lmod <- lm(math ~ social+craven, jspr)
bpar <- apar*var(residuals(lmod))
lgprior <- list(prec = list(prior="loggamma", param = c(apar,bpar)))
formula = math ~ social+craven+f(school, model="iid", hyper = lgprior)+f(classch, model="iid", hyper = lgprior)
result <- inla(formula, family="gaussian", data=jspr)
summary(result)
Fixed effects:
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 32.002 1.062     29.921   32.002     34.085 32.002   0
social2     -0.396 1.099     -2.552   -0.396      1.760 -0.396   0
social3     -0.761 1.171     -3.058   -0.761      1.537 -0.761   0
social4     -2.099 1.045     -4.149   -2.099     -0.049 -2.099   0
social5     -1.424 1.164     -3.707   -1.424      0.860 -1.424   0
social6     -2.350 1.239     -4.779   -2.350      0.080 -2.350   0
social7     -3.056 1.277     -5.561   -3.056     -0.551 -3.056   0
social8     -3.554 1.709     -6.905   -3.554     -0.202 -3.554   0
social9     -0.888 1.108     -3.060   -0.888      1.285 -0.888   0
craven       0.586 0.032      0.522    0.586      0.650  0.586   0

Model hyperparameters:
                                         mean    sd 0.025quant 0.5quant 0.975quant  mode
Precision for the Gaussian observations 0.037 0.002      0.033    0.037      0.040 0.037
Precision for school                    0.266 0.087      0.137    0.252      0.478 0.241
Precision for classch                   0.342 0.101      0.186    0.328      0.580 0.319

 is computed 

Results are more credible.

Compute the transforms to an SD scale for the random effect terms. Make a table of summary statistics for the posteriors:

sigmasch <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmacla <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmasch,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmacla,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c("mu",result$names.fixed[2:10],"school SD","class SD","epsilon")
data.frame(restab) |> kable()
mu social2 social3 social4 social5 social6 social7 social8 social9 craven school.SD class.SD epsilon
mean 32.002 -0.39634 -0.76086 -2.099 -1.4237 -2.3496 -3.0562 -3.5537 -0.88791 0.58599 2.0144 1.7639 5.2212
sd 1.0611 1.0985 1.1706 1.0448 1.1635 1.238 1.2767 1.7079 1.1072 0.032406 0.31699 0.25456 0.12383
quant0.025 29.92 -2.5524 -3.0584 -4.1495 -3.7073 -4.7795 -5.562 -6.9058 -3.061 0.52238 1.451 1.3155 4.9814
quant0.25 31.284 -1.1394 -1.5527 -2.8056 -2.2107 -3.187 -3.9197 -4.7089 -1.6368 0.56407 1.7896 1.5835 5.1363
quant0.5 32 -0.39875 -0.7634 -2.1012 -1.4263 -2.3523 -3.0589 -3.5574 -0.8903 0.58593 1.9945 1.7457 5.2197
quant0.75 32.715 0.34202 0.025949 -1.3967 -0.64168 -1.5175 -2.198 -2.4057 -0.1437 0.60778 2.2156 1.9241 5.3042
quant0.975 34.081 1.7557 1.5323 -0.052322 0.85567 0.07534 -0.55533 -0.20801 1.2811 0.64947 2.6929 2.3136 5.4678

Also construct a plot the SD posteriors:

ddf <- data.frame(rbind(sigmasch,sigmacla,sigmaepsilon),errterm=gl(3,nrow(sigmasch),labels = c("school","class","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("math")+ylab("density")

Posteriors look OK although no weight given to smaller values.

Penalized Complexity Prior

In Simpson et al (2015), penalized complexity priors are proposed. This requires that we specify a scaling for the SDs of the random effects. We use the SD of the residuals of the fixed effects only model (what might be called the base model in the paper) to provide this scaling.

lmod <- lm(math ~ craven + social, jspr)
sdres <- sd(residuals(lmod))
pcprior <- list(prec = list(prior="pc.prec", param = c(3*sdres,0.01)))
formula = math ~ social+craven+f(school, model="iid", hyper = pcprior)+f(classch, model="iid", hyper = pcprior)
result <- inla(formula, family="gaussian", data=jspr)
summary(result)
Fixed effects:
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) 31.921 1.027     29.907   31.921     33.935 31.921   0
social2     -0.303 1.093     -2.447   -0.303      1.843 -0.303   0
social3     -0.669 1.164     -2.952   -0.670      1.614 -0.670   0
social4     -2.003 1.037     -4.037   -2.003      0.032 -2.003   0
social5     -1.300 1.156     -3.567   -1.300      0.969 -1.300   0
social6     -2.306 1.230     -4.719   -2.306      0.107 -2.306   0
social7     -3.011 1.267     -5.496   -3.011     -0.524 -3.011   0
social8     -3.469 1.702     -6.807   -3.469     -0.130 -3.469   0
social9     -0.791 1.100     -2.947   -0.791      1.366 -0.791   0
craven       0.585 0.032      0.522    0.585      0.648  0.585   0

Model hyperparameters:
                                         mean     sd 0.025quant 0.5quant 0.975quant  mode
Precision for the Gaussian observations 0.037  0.002      0.033    0.036      0.040 0.036
Precision for school                    7.524 17.002      0.084    2.670     44.635 0.557
Precision for classch                   0.279  0.097      0.150    0.260      0.525 0.233

 is computed 

Compute the summaries as before:

sigmasch <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[2]])
sigmacla <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[3]])
sigmaepsilon <- inla.tmarginal(function(x) 1/sqrt(exp(x)),result$internal.marginals.hyperpar[[1]])
restab=sapply(result$marginals.fixed, function(x) inla.zmarginal(x,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmasch,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmacla,silent=TRUE))
restab=cbind(restab, inla.zmarginal(sigmaepsilon,silent=TRUE))
colnames(restab) = c("mu",result$names.fixed[2:10],"school SD","class SD","epsilon")
data.frame(restab) |> kable()
mu social2 social3 social4 social5 social6 social7 social8 social9 craven school.SD class.SD epsilon
mean 31.921 -0.30258 -0.66933 -2.0033 -1.2996 -2.3064 -3.0106 -3.4689 -0.79101 0.58504 0.88704 1.9696 5.2364
sd 1.0262 1.0928 1.1632 1.0368 1.1556 1.2296 1.2667 1.7012 1.0991 0.032166 0.91583 0.30665 0.1252
quant0.025 29.907 -2.4472 -2.9522 -4.0381 -3.5673 -4.7199 -5.4969 -6.8078 -2.948 0.5219 0.1483 1.385 4.9923
quant0.25 31.227 -1.0417 -1.4561 -2.7046 -2.0813 -3.138 -3.8673 -4.6195 -1.5344 0.56329 0.35749 1.7537 5.1509
quant0.5 31.919 -0.30507 -0.67194 -2.0056 -1.3023 -2.3091 -3.0133 -3.4726 -0.79347 0.58498 0.59708 1.9684 5.2357
quant0.75 32.611 0.43179 0.11239 -1.3065 -0.52306 -1.48 -2.1592 -2.3256 -0.05239 0.60666 1.0595 2.1784 5.3207
quant0.975 33.931 1.8386 1.6097 0.028168 0.9647 0.10229 -0.52906 -0.13597 1.3624 0.64805 3.3753 2.5773 5.484

Make the plots:

ddf <- data.frame(rbind(sigmasch,sigmacla,sigmaepsilon),errterm=gl(3,nrow(sigmasch),labels = c("school","class","epsilon")))
ggplot(ddf, aes(x,y, linetype=errterm))+geom_line()+xlab("math")+ylab("density")

Posteriors put more weight on lower values compared to gamma prior.

STAN

STAN performs Bayesian inference using MCMC. I use cmdstanr to access Stan from R.

You see below the Stan code to fit our model. Rmarkdown allows the use of Stan chunks (elsewhere I have R chunks). The chunk header looks like this.

STAN chunk will be compiled to ‘mod’. Chunk header is:

cmdstan, output.var="mod", override = FALSE
data {
     int<lower=0> Nobs;
     int<lower=0> Npreds;
     int<lower=0> Nlev1;
     int<lower=0> Nlev2;
     array[Nobs] real y;
     matrix[Nobs,Npreds] x;
     array[Nobs] int<lower=1,upper=Nlev1> levind1;
     array[Nobs] int<lower=1,upper=Nlev2> levind2;
     real<lower=0> sdscal;
}
parameters {
           vector[Npreds] beta;
           real<lower=0> sigmalev1;
           real<lower=0> sigmalev2;
           real<lower=0> sigmaeps;

           vector[Nlev1] eta1;
           vector[Nlev2] eta2;
}
transformed parameters {
  vector[Nlev1] ran1;
  vector[Nlev2] ran2;
  vector[Nobs] yhat;

  ran1  = sigmalev1 * eta1;
  ran2  = sigmalev2 * eta2;

  for (i in 1:Nobs)
    yhat[i] = x[i]*beta+ran1[levind1[i]]+ran2[levind2[i]];

}
model {
  eta1 ~ normal(0, 1);
  eta2 ~ normal(0, 1);
  sigmalev1 ~ cauchy(0, 2.5*sdscal);
  sigmalev2 ~ cauchy(0, 2.5*sdscal);
  sigmaeps ~ cauchy(0, 2.5*sdscal);
  y ~ normal(yhat, sigmaeps);
}

We have used uninformative priors for the treatment effects but slightly informative half-cauchy priors for the three variances. All the fixed effects have been collected into a single design matrix. The school and class variables need to be renumbered into consecutive positive integers. Somewhat inconvenient since the schools are numbered up to 50 but have no data for two schools so only 48 schools are actually used.

lmod <- lm(math ~ craven + social, jspr)
sdscal <- sd(residuals(lmod))
Xmatrix <- model.matrix( ~ craven + social, jspr)
jspr$school <- factor(jspr$school)
jspr$classch <- factor(paste(jspr$school,jspr$class,sep="."))
jspdat <- list(Nobs=nrow(jspr),
               Npreds=ncol(Xmatrix),
               Nlev1=length(unique(jspr$school)),
               Nlev2=length(unique(jspr$classch)),
               y=jspr$math,
               x=Xmatrix,
               levind1=as.numeric(jspr$school),
               levind2=as.numeric(jspr$classch),
               sdscal=sdscal)

Do the MCMC sampling:

fit <- mod$sample(
  data = jspdat, 
  seed = 123, 
  chains = 4, 
  parallel_chains = 4,
  refresh = 500 # print update every 500 iters
)
Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 finished in 5.8 seconds.
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 finished in 6.1 seconds.
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 6.4 seconds.
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 finished in 6.5 seconds.

All 4 chains finished successfully.
Mean chain execution time: 6.2 seconds.
Total execution time: 6.7 seconds.

Diagnostics

Extract the draws into a convenient dataframe format:

draws_df <- fit$draws(format = "df")

For the School SD:

ggplot(draws_df,
       aes(x=.iteration,y=sigmalev1,color=factor(.chain))) + geom_line() +
  labs(color = 'Chain', x="Iteration")

For the class SD

ggplot(draws_df,
       aes(x=.iteration,y=sigmalev2,color=factor(.chain))) + geom_line() +
  labs(color = 'Chain', x="Iteration")

All these are satisfactory.

Output Summary

Display the parameters of interest:

fit$summary(c("beta","sigmalev1","sigmalev2","sigmaeps"))
# A tibble: 13 × 10
   variable    mean median     sd    mad     q5    q95  rhat ess_bulk ess_tail
   <chr>      <num>  <num>  <num>  <num>  <num>  <num> <num>    <num>    <num>
 1 beta[1]   32.0   32.0   1.05   1.06   30.3   33.7    1.00     831.    1334.
 2 beta[2]    0.584  0.584 0.0332 0.0337  0.529  0.639  1.00    6034.    2922.
 3 beta[3]   -0.360 -0.357 1.10   1.10   -2.20   1.43   1.00     921.    1531.
 4 beta[4]   -0.756 -0.751 1.19   1.20   -2.75   1.20   1.00    1002.    1744.
 5 beta[5]   -2.12  -2.11  1.06   1.07   -3.85  -0.412  1.00     825.    1688.
 6 beta[6]   -1.38  -1.37  1.17   1.17   -3.31   0.518  1.00     986.    1842.
 7 beta[7]   -2.40  -2.37  1.24   1.25   -4.47  -0.323  1.00    1048.    2070.
 8 beta[8]   -3.05  -3.06  1.29   1.29   -5.11  -0.955  1.00    1101.    2230.
 9 beta[9]   -3.55  -3.54  1.69   1.69   -6.35  -0.811  1.00    1659.    2600.
10 beta[10]  -0.899 -0.888 1.12   1.14   -2.76   0.926  1.00     954.    1466.
11 sigmalev1  1.79   1.81  0.409  0.379   1.06   2.40   1.00     616.     755.
12 sigmalev2  0.975  0.971 0.493  0.523   0.158  1.79   1.00     447.     713.
13 sigmaeps   5.27   5.26  0.127  0.128   5.06   5.47   1.00    5333.    3035.

Remember that the beta correspond to the following parameters:

colnames(Xmatrix)
 [1] "(Intercept)" "craven"      "social2"     "social3"     "social4"     "social5"     "social6"     "social7"    
 [9] "social8"     "social9"    

The results are comparable to the REML fit. The effective sample sizes are sufficient.

Posterior Distributions

We can use extract to get at various components of the STAN fit. First consider the SDs for random components:

sdf = stack(draws_df[,c("sigmaeps","sigmalev1","sigmalev2")])
colnames(sdf) = c("math","SD")
levels(sdf$SD) = c("individual","school","class")
ggplot(sdf, aes(x=math,color=SD)) + geom_density() 

As usual the error SD distribution is a more concentrated. The school SD is more diffuse and smaller whereas the class SD is smaller still. Now the treatement effects, considering the social class parameters first:

sdf = stack(draws_df[,4:11])
colnames(sdf) = c("math","social")
levels(sdf$social) = 2:9
ggplot(sdf, aes(x=math,color=social)) + geom_density() 

Now just the raven score parameter:

ggplot(draws_df, aes(x=.data[["beta[2]"]])) + geom_density() + xlab("Math per Raven")

Now for the schools:

sdf = stack(draws_df[,startsWith(colnames(draws_df),"ran1")])
colnames(sdf) = c("math","school")
levels(sdf$school) = 1:48
ggplot(sdf, aes(x=math,group=school)) + geom_density() 

We can see the variation between schools. A league table might be used to rank the schools but the high overlap in these distributions show that such a ranking should not be interpreted too seriously.

BRMS

BRMS stands for Bayesian Regression Models with STAN. It provides a convenient wrapper to STAN functionality. We specify the model as in lmer() above. I have used more than the standard number of iterations because this reduces some problems and does not cost much computationally.

suppressMessages(bmod <- brm(math ~ craven + social+(1|school)+(1|school:class),data=jspr,iter=10000, cores=4, backend = "cmdstanr"))

We get some minor warnings. We can obtain some posterior densities and diagnostics with:

plot(bmod, variable = "^s", regex=TRUE)

We have chosen only the random effect hyperparameters since this is where problems will appear first. Looks OK. We can see some weight is given to values of the class effect SD close to zero.

We can look at the STAN code that brms used with:

stancode(bmod)
// generated with brms 2.19.0
functions {
  
}
data {
  int<lower=1> N; // total number of observations
  vector[N] Y; // response variable
  int<lower=1> K; // number of population-level effects
  matrix[N, K] X; // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1; // number of grouping levels
  int<lower=1> M_1; // number of coefficients per level
  array[N] int<lower=1> J_1; // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  // data for group-level effects of ID 2
  int<lower=1> N_2; // number of grouping levels
  int<lower=1> M_2; // number of coefficients per level
  array[N] int<lower=1> J_2; // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  int prior_only; // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc; // centered version of X without an intercept
  vector[Kc] means_X; // column means of X before centering
  for (i in 2 : K) {
    means_X[i - 1] = mean(X[ : , i]);
    Xc[ : , i - 1] = X[ : , i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b; // population-level effects
  real Intercept; // temporary intercept for centered predictors
  real<lower=0> sigma; // dispersion parameter
  vector<lower=0>[M_1] sd_1; // group-level standard deviations
  array[M_1] vector[N_1] z_1; // standardized group-level effects
  vector<lower=0>[M_2] sd_2; // group-level standard deviations
  array[M_2] vector[N_2] z_2; // standardized group-level effects
}
transformed parameters {
  vector[N_1] r_1_1; // actual group-level effects
  vector[N_2] r_2_1; // actual group-level effects
  real lprior = 0; // prior contributions to the log posterior
  r_1_1 = sd_1[1] * z_1[1];
  r_2_1 = sd_2[1] * z_2[1];
  lprior += student_t_lpdf(Intercept | 3, 32, 5.9);
  lprior += student_t_lpdf(sigma | 3, 0, 5.9)
            - 1 * student_t_lccdf(0 | 3, 0, 5.9);
  lprior += student_t_lpdf(sd_1 | 3, 0, 5.9)
            - 1 * student_t_lccdf(0 | 3, 0, 5.9);
  lprior += student_t_lpdf(sd_2 | 3, 0, 5.9)
            - 1 * student_t_lccdf(0 | 3, 0, 5.9);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += Intercept;
    for (n in 1 : N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
    }
    target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(z_1[1]);
  target += std_normal_lpdf(z_2[1]);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
}

We see that brms is using student t distributions with 3 degrees of freedom for the priors. For the three error SDs, this will be truncated at zero to form half-t distributions. You can get a more explicit description of the priors with prior_summary(bmod). These are qualitatively similar to the the PC prior used in the INLA fit.

We examine the fit:

summary(bmod)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: math ~ craven + social + (1 | school) + (1 | school:class) 
   Data: jspr (Number of observations: 953) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Group-Level Effects: 
~school (Number of levels: 48) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.78      0.44     0.72     2.57 1.00     1601      911

~school:class (Number of levels: 90) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.01      0.50     0.09     2.02 1.00     1494     1523

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    32.01      1.05    29.93    34.05 1.00     4780     7898
craven        0.58      0.03     0.52     0.65 1.00    23115    14709
social2      -0.35      1.11    -2.55     1.83 1.00     5331     8891
social3      -0.77      1.18    -3.07     1.55 1.00     5468     9286
social4      -2.11      1.06    -4.17    -0.05 1.00     4549     7980
social5      -1.35      1.17    -3.68     0.92 1.00     5099     9617
social6      -2.36      1.25    -4.82     0.07 1.00     5737     9807
social7      -3.04      1.29    -5.59    -0.52 1.00     6085     9948
social8      -3.53      1.72    -6.90    -0.13 1.00     8258    10286
social9      -0.88      1.13    -3.08     1.34 1.00     5000     8754

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     5.26      0.13     5.02     5.51 1.00    18923    12386

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The results are consistent with those seen previously.

MGCV

It is possible to fit some GLMMs within the GAM framework of the mgcv package. An explanation of this can be found in this blog

gmod = gam(math ~ craven + social+s(school,bs="re")+s(classch,bs="re"),data=jspr, method="REML")

and look at the summary output:

summary(gmod)
Family: gaussian 
Link function: identity 

Formula:
math ~ craven + social + s(school, bs = "re") + s(classch, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  32.0107     1.0350   30.93   <2e-16
craven        0.5841     0.0321   18.21   <2e-16
social2      -0.3611     1.0948   -0.33    0.742
social3      -0.7767     1.1649   -0.67    0.505
social4      -2.1196     1.0396   -2.04    0.042
social5      -1.3632     1.1585   -1.18    0.240
social6      -2.3703     1.2330   -1.92    0.055
social7      -3.0482     1.2703   -2.40    0.017
social8      -3.5473     1.7027   -2.08    0.038
social9      -0.8863     1.1031   -0.80    0.422

Approximate significance of smooth terms:
            edf Ref.df    F p-value
s(school)  27.4     47 2.67  <2e-16
s(classch) 15.6     89 0.33   0.052

R-sq.(adj) =  0.378   Deviance explained = 41.2%
-REML = 2961.8  Scale est. = 27.572    n = 953

We get the fixed effect estimates. We also get tests on the random effects (as described in this article. The hypothesis of no variation is rejected for the school but not for the class. This is consistent with earlier findings.

We can get an estimate of the operator and error SD:

gam.vcomp(gmod)
Standard deviations and 0.95 confidence intervals:

           std.dev   lower  upper
s(school)   1.7967 1.21058 2.6667
s(classch)  1.0162 0.41519 2.4875
scale       5.2509 5.00863 5.5049

Rank: 3/3

The point estimates are the same as the REML estimates from lmer earlier. The confidence intervals are different. A bootstrap method was used for the lmer fit whereas gam is using an asymptotic approximation resulting in substantially different results. Given the problems of parameters on the boundary present in this example, the bootstrap results appear more trustworthy.

The fixed and random effect estimates can be found with:

coef(gmod)
  (Intercept)        craven       social2       social3       social4       social5       social6       social7 
    32.010740      0.584117     -0.361080     -0.776732     -2.119649     -1.363206     -2.370314     -3.048249 
      social8       social9   s(school).1   s(school).2   s(school).3   s(school).4   s(school).5   s(school).6 
    -3.547252     -0.886345     -2.256262     -0.422953      0.836164     -1.256567     -0.677051      0.186343 
  s(school).7   s(school).8   s(school).9  s(school).10  s(school).11  s(school).12  s(school).13  s(school).14 
     1.323942      0.364974     -2.022321      0.558287     -0.505067      0.016585     -0.615911      0.421654 
 s(school).15  s(school).16  s(school).17  s(school).18  s(school).19  s(school).20  s(school).21  s(school).22 
    -0.219725      0.441527     -0.204674      0.621186     -0.304768     -2.509540     -1.069436     -0.182572 
 s(school).23  s(school).24  s(school).25  s(school).26  s(school).27  s(school).28  s(school).29  s(school).30 
     2.242250      1.135253      1.155773      0.375632     -2.487656     -2.501408      1.101826      2.361060 
 s(school).31  s(school).32  s(school).33  s(school).34  s(school).35  s(school).36  s(school).37  s(school).38 
     0.056199     -1.044550      2.557490     -0.981792      2.559601      0.588989      2.437473     -1.000318 
 s(school).39  s(school).40  s(school).41  s(school).42  s(school).43  s(school).44  s(school).45  s(school).46 
    -2.022936      1.655924     -0.279566     -0.092512     -2.221997      0.216297      1.662840     -0.657181 
 s(school).47  s(school).48  s(classch).1  s(classch).2  s(classch).3  s(classch).4  s(classch).5  s(classch).6 
     0.066572      0.592921      0.573442     -1.295266     -0.101570      0.280178      0.328046      0.171502 
 s(classch).7  s(classch).8  s(classch).9 s(classch).10 s(classch).11 s(classch).12 s(classch).13 s(classch).14 
    -0.222321     -0.438808     -0.666694      0.672000     -0.197043     -0.085737      0.220633     -0.432686 
s(classch).15 s(classch).16 s(classch).17 s(classch).18 s(classch).19 s(classch).20 s(classch).21 s(classch).22 
     0.362391      0.280296     -0.139043     -0.065479      0.452835     -0.254105     -0.135311     -0.539712 
s(classch).23 s(classch).24 s(classch).25 s(classch).26 s(classch).27 s(classch).28 s(classch).29 s(classch).30 
     0.442210     -0.204360     -0.598493     -0.164090     -0.178044      0.181223     -0.239631      0.717342 
s(classch).31 s(classch).32 s(classch).33 s(classch).34 s(classch).35 s(classch).36 s(classch).37 s(classch).38 
     0.363191      0.266493      0.103262     -0.045999      0.166171     -0.692729     -0.103123     -0.800251 
s(classch).39 s(classch).40 s(classch).41 s(classch).42 s(classch).43 s(classch).44 s(classch).45 s(classch).46 
     0.583420     -0.315914     -0.057735      0.410232      0.374896      0.436353     -0.055897      0.017979 
s(classch).47 s(classch).48 s(classch).49 s(classch).50 s(classch).51 s(classch).52 s(classch).53 s(classch).54 
    -0.397021      0.062848      0.224533      0.880602     -0.286941     -0.047087     -0.267009      0.603827 
s(classch).55 s(classch).56 s(classch).57 s(classch).58 s(classch).59 s(classch).60 s(classch).61 s(classch).62 
    -0.232530      0.447571      0.188430      0.779797     -0.243148     -0.076874     -0.402001     -0.647178 
s(classch).63 s(classch).64 s(classch).65 s(classch).66 s(classch).67 s(classch).68 s(classch).69 s(classch).70 
     0.529764     -1.156667      0.131640      0.935589     -0.029596     -0.710862      0.069198      0.487627 
s(classch).71 s(classch).72 s(classch).73 s(classch).74 s(classch).75 s(classch).76 s(classch).77 s(classch).78 
     0.044350      0.037317      0.073938     -0.053456     -0.268045      0.122984     -0.101686     -0.252028 
s(classch).79 s(classch).80 s(classch).81 s(classch).82 s(classch).83 s(classch).84 s(classch).85 s(classch).86 
     0.035425      0.189688      0.258015     -0.198400      0.373615      0.049941      0.311116      0.069280 
s(classch).87 s(classch).88 s(classch).89 s(classch).90 
    -0.263633      0.582019     -0.587131     -0.641870 

GINLA

In Wood (2019), a simplified version of INLA is proposed. The first construct the GAM model without fitting and then use the ginla() function to perform the computation.

gmod = gam(math ~ craven + social+s(school,bs="re")+s(classch,bs="re"),
           data=jspr, fit = FALSE)
gimod = ginla(gmod)

We get the posterior density for the intercept as:

plot(gimod$beta[1,],gimod$density[1,],type="l",xlab="math",ylab="density")

We get the posterior density for the raven effect as:

plot(gimod$beta[2,],gimod$density[2,],type="l",xlab="math per raven",ylab="density")

and for the social effects as:

xmat = t(gimod$beta[3:10,])
ymat = t(gimod$density[3:10,])
matplot(xmat, ymat,type="l",xlab="math",ylab="density")
legend("left",paste0("social",2:9),col=1:8,lty=1:8)

We can see some overlap between the effects, but strong evidence of a negative outcome relative to social class 1 for some classes.

It is not straightforward to obtain the posterior densities of the hyperparameters.

Discussion

See the Discussion of the single random effect model for general comments.

  • As with the previous analyses, sometimes the INLA posteriors for the hyperparameters have densities which do not give weight to close-to-zero values where other analyses suggest this might be reasonable.

  • There is relatively little disagreement between the methods and much similarity.

  • There were no major computational issue with the analyses (in contrast with some of the other examples)

  • The mgcv analyses took a little longer than previous analyses because the sample size is larger (but still were quite fast).

Package version info

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mgcv_1.8-42     nlme_3.1-162    brms_2.19.0     Rcpp_1.0.10     cmdstanr_0.5.3  knitr_1.43      INLA_23.05.30-1
 [8] sp_2.0-0        foreach_1.5.2   RLRsim_3.1-8    pbkrtest_0.5.2  lme4_1.1-33     Matrix_1.5-4.1  ggplot2_3.4.2  
[15] faraway_1.0.8  

loaded via a namespace (and not attached):
  [1] gridExtra_2.3        inline_0.3.19        rlang_1.1.1          magrittr_2.0.3       matrixStats_1.0.0   
  [6] compiler_4.3.1       loo_2.6.0            systemfonts_1.0.4    callr_3.7.3          vctrs_0.6.3         
 [11] reshape2_1.4.4       stringr_1.5.0        crayon_1.5.2         pkgconfig_2.0.3      fastmap_1.1.1       
 [16] backports_1.4.1      ellipsis_0.3.2       labeling_0.4.2       utf8_1.2.3           threejs_0.3.3       
 [21] promises_1.2.0.1     rmarkdown_2.22       markdown_1.7         ps_1.7.5             nloptr_2.0.3        
 [26] MatrixModels_0.5-1   purrr_1.0.1          xfun_0.39            jsonlite_1.8.5       later_1.3.1         
 [31] Deriv_4.1.3          prettyunits_1.1.1    broom_1.0.5          R6_2.5.1             dygraphs_1.1.1.6    
 [36] StanHeaders_2.26.27  stringi_1.7.12       boot_1.3-28.1        rstan_2.21.8         iterators_1.0.14    
 [41] zoo_1.8-12           base64enc_0.1-3      bayesplot_1.10.0     httpuv_1.6.11        splines_4.3.1       
 [46] igraph_1.5.0         tidyselect_1.2.0     rstudioapi_0.14      abind_1.4-5          yaml_2.3.7          
 [51] codetools_0.2-19     miniUI_0.1.1.1       processx_3.8.1       pkgbuild_1.4.1       lattice_0.21-8      
 [56] tibble_3.2.1         plyr_1.8.8           shiny_1.7.4          withr_2.5.0          bridgesampling_1.1-2
 [61] posterior_1.4.1      coda_0.19-4          evaluate_0.21        RcppParallel_5.1.7   xts_0.13.1          
 [66] pillar_1.9.0         tensorA_0.36.2       stats4_4.3.1         checkmate_2.2.0      DT_0.28             
 [71] shinyjs_2.1.0        distributional_0.3.2 generics_0.1.3       rstantools_2.3.1     munsell_0.5.0       
 [76] scales_1.2.1         minqa_1.2.5          gtools_3.9.4         xtable_1.8-4         glue_1.6.2          
 [81] tools_4.3.1          shinystan_2.6.0      data.table_1.14.8    colourpicker_1.2.0   mvtnorm_1.2-2       
 [86] grid_4.3.1           tidyr_1.3.0          crosstalk_1.2.0      colorspace_2.1-0     cli_3.6.1           
 [91] fansi_1.0.4          svglite_2.1.1        Brobdingnag_1.2-9    dplyr_1.1.2          gtable_0.3.3        
 [96] digest_0.6.31        htmlwidgets_1.6.2    farver_2.1.1         htmltools_0.5.5      lifecycle_1.0.3     
[101] mime_0.12            shinythemes_1.2.0    MASS_7.3-60