7 Use of historical control data with a covariate

Authors

Yingying Wang - yingying.wang@novartis.com

Sebastian Weber - sebastian.weber@novartis.com

The use of baseline covariates for trial analysis is frequently applied in practice whenever known prognostic factors are relevant in the context of the study population. This can complicate the use of historical control information in particular for a non-normal endpoint for which the respective generalized linear models are often not collapsible. This case study demonstrates

setting up a bi-variate meta-analysis for a binary endpoint to borrow information on the intercept and a baseline covariate effect from historical data
how we can make use of incomplete responder data as one often encounters in practice (responder data not given for each level of a categorical covariate)
how two different likelihoods can be used with custom Stan code adapting the brms generated Stan model
how to summarize with a multi-variate normal mixture the MAP prior with covariates
how to use the multi-variate normal mixture MAP prior as a prior for the main analysis using brms

To run the R code of this section please ensure to load these libraries and options first:

library(ggplot2)
library(dplyr)
library(tidyr)
library(brms)
library(posterior)
library(bayesplot)
library(RBesT)
here::i_am("src/02ad_meta_analysis_covariate.qmd")
library(gt)
theme_set(theme_bw(12))
# instruct brms to use cmdstanr as backend and cache all Stan binaries
options(brms.backend="cmdstanr", cmdstanr_write_stan_file_dir=here::here("_brms-cache"))
# create cache directory if not yet available
dir.create(here::here("_brms-cache"), FALSE)
set.seed(57339)
# allow for wider and prettier printing
options(width=140, digits=2)
# make brms less verbose
options(brms.short_summary = TRUE)

7.1 Background

Atopic dermatitis (AD) is an inflammation of the skin. It’s prevalence is substantial in the population and various treatments have been studied to date. Therefore, a substantial amount of data has been generated to date on experimental treatments for AD and along with it on placebo treated patient data. Given that treatment procedures are well standardized the historical data on placebo treatments is relevant for future trials in the indication.

One of the standard clinical endpoints is the investigator global assessment (IGA) score which ranges from 0 (clear) to 4 (severe). Changes in this score over a time-span of 16 weeks compared to baseline when treatment is initiated is used to evaluate the efficacy of treatments. Derived from the IGA score one often considers the binary IGA response status defined as an improvement of at least 2 points and reaching either 0 or 1 (clear or almost clear) IGA score. By construction of the score the initial status at baseline being moderate or severe is a prognostic factor for reaching IGA response by week 16 or not as it is harder to reach response whenever one has baseline score 4 (severe) at baseline in comparison to score 3 (moderate).

As a consequence, patients are often stratified by baseline response score into moderate and severe for trials studying this population. While most trials report the baseline distribution of the number of moderate and severe patients, only a fraction of trials report the outcome responder data stratified by baseline response data, which is a complication for using this historical data whenever one wishes to leverage the historical data.

In the following we illustrate how an informative MAP prior is derived in such a situation. Historical data on a placebo treatment for use in a future trial is used as we are aiming to reduce the required sample size in the placebo arm while maintaining appropiate power to detect a treatment effect in a two arm trial.

7.2 Data

We consider the planning stage of a potential new trial in the indication of AD. Let’s assume that an initial search for relevant historical control data in the literature lead to the following set of identified historical trials:

	study	compound	trial	start_time	n	n_severe	r_ovrall	r_mod	r_severe	oral	is_complete
	1	Duplmb	2b	2013 May	61	29	1	—	—	0	1
	2	Duplmb	SOLO 1	2014 Oct	224	111	23	16	7	0	0
	3	Duplmb	SOLO 2	2014 Dec	236	115	20	17	3	0	0
	4	Trlknm	ECZTRA 1	2017 May	197	102	14	10	4	0	0
	5	Trlknm	ECZTRA 2	2017 Jun	193	100	18	13	5	0	0
	6	Rctnlm	2b	2018	57	26	1	—	—	0	1
	7	Lbrkzm	D	2018 Jan	52	20	8	—	—	0	1
	8	Lbrkzm	ADvocate1	2019	141	58	18	—	—	0	1
	9	Lbrkzm	ADvocate2	2019	146	51	16	—	—	0	1
	10	Amltlm	2a	2018 Dec	24	15	2	—	—	0	1
	11	Updctn	2	2016	41	23	1	—	—	1	1
	12	Updctn	Measure Up 1	2018	281	125	24	—	—	1	1
	13	Updctn	Measure Up 2	2018	278	153	13	—	—	1	1
	14	Abrctn	2b	2016	52	20	3	—	—	1	1
	15	Abrctn	JADE MONO 2	2017	78	26	7	6	1	1	0
	16	Abrctn	JADE MONO 1	2018	77	31	6	5	1	1	0
	17	Brctnb	BREEZE-AD1	2017	249	105	12	—	—	1	1
	18	Brctnb	BREEZE-AD2	2018	244	121	11	—	—	1	1
min	—	—	—	—	24.0	15.0	1.0	5.0	1.0	0.0%	0.0%
max	—	—	—	—	281.0	153.0	24.0	17.0	7.0	100.0%	100.0%
avg	—	—	—	—	146.2	68.4	11.0	11.2	3.5	44.4%	66.7%

A graphical overview of the data as a forest plot showing the historical responder data in separate for moderate and severe (when available) and as overall number of responders:

7.3 Model description

7.3.1 Trial analysis model

The aim is to decrease the sample size of the placebo arm in a future placebo controlled trial. To start, we first define the analysis model used for the future trial. We assume that the trial enrolls patients with either an IGA score of 3 (moderate) or 4 (severe). The baseline severity is a clinically relevant prognostic factor for response. Therefore, we assume that the trial analysis accounts for a patient’s disease status at baseline via an indicator severe resulting in the analysis model of the binary IGA responder outcome as

r_i|\theta_i \sim \text{Bernoulli}(\theta_i)

\text{logit}(\theta_i) = \alpha_0 + I(\text{severe}_i) \, \beta_1 + I(\text{active}_i) \, \beta_2 .

Our goal is not to derive from the historical data above informative priors for the overall intercept \alpha_0 and the covariate effect due to baseline disease status \beta_1.

7.3.2 Historical control data synthesis

Aggregate historical data necessitates adapting the trial analysis model in several ways. First, the Bernoulli likelihood for each individual patient i is transferred to an equivalent Binomial likelihood. Additionally, since each patient is classified as either moderate (l = 0) or severe (l = 1), it becomes necessary to introduce a pair of Binomial likelihood terms for each historical study (h), with one term corresponding to each baseline disease severity level.

Furthermore, the inclusion of the covariate effect for baseline severity requires deriving a MAP prior for both the intercept and the log odds ratio of the covariate effect. If only a MAP prior were derived for the intercept while assigning a non-informative prior to the covariate effect, the trial analysis would borrow information exclusively for the disease severity level linked to the reference category of the intercept, leaving the covariate effect uninformative.

To address this, the trial analysis model is extended by incorporating a hierarchical structure. This hierarchical model assigns trial-specific intercepts and slopes for the covariate effect, with these parameters being related across trials through an exchangeability assumption. This assumption enables partial pooling of the available data, allowing the model to share information across trials while also capturing trial-specific variability.

r_{h,l}|\theta_{h,l} \sim \text{Binomial}(\theta_{h,l})

\text{logit}(\theta_{h,l}) = \beta_{0,h} + I(\text{severe}_{h,l}) \, \beta_{1,h} + I(\text{oral}_{h}) \, \beta_3

with the usual between-trial heterogeneity on the intercept and the slope:

(\beta_{0,h}, \beta_{1,h})|\mu_0,\mu_1,\tau_0,\tau_1,\rho \sim \text{Normal}((\mu_0, \mu_1), \Sigma)

\Sigma = \begin{pmatrix} \tau_0^2 & \tau_0 \, \tau_1 \, \rho \\ \tau_0 \, \tau_1 \, \rho & \tau_1^2 \end{pmatrix}

However, this model requires for each historical trial the outcome of the responders to be given for each baseline disease severity level separatley (corresponding to trials marked with “complete” information). Thus, this model can only be applied in this form to a subset of the data with complete information.

7.3.3 Modelling explicitly the sum of two binomials (advanced)

For a number of trials we do not have reported the complete information needed for the above model. The complete response information per trial includes the number of responders stratified by disease severity,

r_{h,l=0} responders with moderate baseline status
r_{h,l=1} responders with severe baseline status.

However, we have instead reported the partial information reported of the overall number of responders, which equals the sum of the two random variables abvove:

r_h = r_{h,l=0} + r_{h,l=1}

In the following we describe how we can use the information on the sum while we model the data as if we were given the stratified response data. That is, we formulate the likelihood on the basis of the given sum of responders r_h while using the latent response rates \theta_{h,l=0} and \theta_{h,l=1} for which the model for the complete data is

r_{h,l=0}|\theta_{h,l=0} \sim \text{Binomial}(\theta_{h,l=0}) r_{h,l=1}|\theta_{h,l=1} \sim \text{Binomial}(\theta_{h,l=1})

The sum of two Binomial variables is distributed as the discrete convolution of the individual Binomial distribution:

P(Y=r) = \sum_{y_1=0}^{r} P_1(y_1) \, P_2(r - y_1)

Accounting for the fact that each variable is bounded by n_1 and n_2 respectivley, we can limit the domain of the sum accordingly:

P(Y=r) = \sum_{y_1=\max(0,r-n_2)}^{\min(n_1, r)} P_1(y_1) \, P_2(r - y_1)

7.4 Implementation

7.4.1 MAP prior for complete historical data only

Modelling jointly the complete and partial information from the historical trials requires two types of likelihoods in a single model depending on the respective case. Here we first model the historical trials with complete information only. This can be formulated as a hierarchical Binomial regression problem. Thus, we first subset to the 6 trials for which complete information is available, which we reformat into a long format for which each trial contributes one row for the moderate and one row for the severe outcome:

study	compound	trial	severe	oral	r	n
2	Dupilumab	SOLO 1	0	0	16	113
2	Dupilumab	SOLO 1	1	0	7	111
3	Dupilumab	SOLO 2	0	0	17	121
3	Dupilumab	SOLO 2	1	0	3	115
4	Tralokinumab	ECZTRA 1	0	0	10	95
4	Tralokinumab	ECZTRA 1	1	0	4	102
5	Tralokinumab	ECZTRA 2	0	0	13	93
5	Tralokinumab	ECZTRA 2	1	0	5	100
15	Abrocitinib	JADE MONO 2	0	1	6	52
15	Abrocitinib	JADE MONO 2	1	1	1	26
16	Abrocitinib	JADE MONO 1	0	1	5	46
16	Abrocitinib	JADE MONO 1	1	1	1	31

In this form, the data can be directly modelled with weakly-informative priors in brms. The priors are set on the basis of the unit information standard deviation. This derives from the overall mean response rate of 7%, for which the respective binomial variance is transformed to the logit scale.

# bi-variate hierarchical Binomial logistic regression model
complete_model <- bf(r | trials(n) ~ 1 + severe + oral + (1 + severe | study),
                     family = binomial, center = FALSE)

complete_prior <- prior(normal(0, 3.5), class = b) + 
  prior(normal(0, 3.5 / 4), class = sd, coef = Intercept, group=study) +
  prior(normal(0, 2 * 3.5 / 4), class = sd, coef = severe, group=study) 

complete_mc <- brm(complete_model, data = hist_pbo_long_complete,
                   prior = complete_prior,
                   control=list(adapt_delta=0.95),
                   seed = 4585678,
                   refresh = 0)

make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.

Start sampling

Running MCMC with 4 sequential chains...

Chain 1 finished in 0.6 seconds.
Chain 2 finished in 0.6 seconds.
Chain 3 finished in 0.6 seconds.
Chain 4 finished in 0.7 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.6 seconds.
Total execution time: 3.0 seconds.

complete_mc

 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + severe + oral + (1 + severe | study) 
   Data: hist_pbo_long_complete (Number of observations: 12) 

Multilevel Hyperparameters:
~study (Number of levels: 6) 
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)             0.20      0.18     0.01     0.65 1.00     1303     1283
sd(severe)                0.39      0.34     0.02     1.27 1.00     1950     1859
cor(Intercept,severe)    -0.05      0.59    -0.97     0.93 1.00     2882     2531

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.89      0.19    -2.26    -1.52 1.00     2205     1983
severe       -1.25      0.34    -1.97    -0.61 1.00     2309     1773
oral         -0.23      0.38    -1.01     0.49 1.00     2959     2755

The MAP prior is then derived as prediction of the parameters (intercept and slope) for a new study. However, most brms prediction functions are meant to predict new data or response means, which involves the linear predictor on the logit scale. Instead, we wish to extract the parameters themselves, which one can obtain by considering respective levels of the linear predictor:

study_new_levels <- data.frame(
  r = 0,
  severe = c(0, 1),
  n = 1,
  oral = 0,
  study = "new_study"
)

# obtain posterior for severe and moderate; note that we must allow to
# sample new levels of the random effects and also specify to do so
# using the "gaussian" option (as brms would otherwise bootstrap from
# existing studies).
prior_complete_new_study <-
  rvar(posterior_linpred(
    complete_mc,
    newdata = study_new_levels,
    allow_new_levels = TRUE,
    sample_new_levels = "gaussian"
  ))

# we get a posterior for each condition (moderate / severe) - the
# second column includes the intercept plus the covariate effect due
# to severity,
prior_complete_new_study

rvar<4000>[2] mean ± sd:
[1] -1.9 ± 0.34  -3.1 ± 0.67

# which we convert to the log odds ratio estimate
# only by subtracting the overall intercept in the first column
prior_complete_new_study[2] <- prior_complete_new_study[2] -
  prior_complete_new_study[1]

# As we wish to formulate a joint prior on all model parameters of the
# trial model - including the treatment effect - a weakly-informative
# prior for the treatment effect is added to the posterior in MC form.
prior_complete_new_study <- c(
  prior_complete_new_study,
  active = rvar_rng(rnorm, 1, 0, 3.5, ndraws = ndraws(prior_complete_new_study))
)
names(prior_complete_new_study) <- c("inter", "severe", "active")

# now turn the MC sample of the prior into a parametric representation
# using functions from RBesT
map_prior_complete <- mixfit(
  as_draws_matrix(prior_complete_new_study),
  type = "mvnorm",
  Nc = 3
)
map_prior_complete

EM for Multivariate Normal Mixture Model
Log-Likelihood = -14806

Multivariate normal mixture
Outcome dimension: 3
Mixture Components:
                         comp1  comp2  comp3 
w                         0.587  0.258  0.155
m[x[inter]]              -1.876 -1.892 -1.868
m[x[severe]]             -1.217 -1.297 -1.307
m[x[active]]              0.256 -0.441 -0.078
s[x[inter]]               0.224  0.208  0.684
s[x[severe]]              0.329  0.729  1.063
s[x[active]]              3.584  3.352  3.599
rho[x[severe],x[inter]]  -0.295 -0.085 -0.054
rho[x[active],x[inter]]  -0.082  0.090  0.046
rho[x[active],x[severe]]  0.083 -0.057 -0.065

plot(map_prior_complete)$mixpairs

Diagnostic plots for mixture multivariate normal densities are experimental.
Please note that these are subject to changes in future releases.

# robustify joint prior
prior_non_inf <- mixmvnorm(c(1, 0, 0, 0, diag(c(3.5, 3.5, 3.5)^2)))
rmap_prior_complete <- mixcombine(
  map_prior_complete,
  prior_non_inf,
  weight = c(0.8, 0.2)
)
print(rmap_prior_complete, digits = 2)

Multivariate normal mixture
Outcome dimension: 3
Mixture Components:
                         comp1  comp2  comp3  comp1 
w                         0.469  0.206  0.124  0.200
m[x[inter]]              -1.876 -1.892 -1.868  0.000
m[x[severe]]             -1.217 -1.297 -1.307  0.000
m[x[active]]              0.256 -0.441 -0.078  0.000
s[x[inter]]               0.224  0.208  0.684  3.500
s[x[severe]]              0.329  0.729  1.063  3.500
s[x[active]]              3.584  3.352  3.599  3.500
rho[x[severe],x[inter]]  -0.295 -0.085 -0.054  0.000
rho[x[active],x[inter]]  -0.082  0.090  0.046  0.000
rho[x[active],x[severe]]  0.083 -0.057 -0.065  0.000

# For pre-specification of the MAP prior we have to report the MAP
# prior with a finite precision in the protocol. Thereby, it is
# recommended to write the MAP prior to disk using the JSON read/write
# functions from RBesT to store the MAP prior with a defined precision
# on file in a human readable format as JSON:
map_complete_new_trial_file <- file.path(here::here(
  "data",
  "map_complete_new_trial_json.txt"
))
write_mix_json(
  map_prior_complete,
  map_complete_new_trial_file,
  pretty = TRUE,
  digits = 4
)

Dropping EM information from mixture object before serialization.

rmap_complete_new_trial_file <- file.path(here::here(
  "data",
  "rmap_complete_new__trial_json.txt"
))
write_mix_json(
  rmap_prior_complete,
  rmap_complete_new_trial_file,
  pretty = TRUE,
  digits = 4
)

cat(readLines(rmap_complete_new_trial_file), sep="\n")

{
  "meta": {
    "dim": [10, 4],
    "dimnames": [
      ["w", "m[x[inter]]", "m[x[severe]]", "m[x[active]]", "s[x[inter]]", "s[x[severe]]", "s[x[active]]", "rho[x[severe],x[inter]]", "rho[x[active],x[inter]]", "rho[x[active],x[severe]]"],
      ["comp1", "comp2", "comp3", "comp1"]
    ],
    "class": ["mvnormMix", "mix"],
    "link": ["identity"],
    "likelihood": ["mvnormal"]
  },
  "comp": [
    [0.4694, 0.2064, 0.1242, 0.2],
    [-1.8764, -1.8915, -1.8678, 0],
    [-1.2169, -1.2967, -1.3074, 0],
    [0.2556, -0.4407, -0.0782, 0],
    [0.2238, 0.2085, 0.6844, 3.5],
    [0.3291, 0.7288, 1.0633, 3.5],
    [3.584, 3.3516, 3.5986, 3.5],
    [-0.2954, -0.0846, -0.0543, 0],
    [-0.0818, 0.0901, 0.0463, 0],
    [0.0827, -0.0574, -0.065, 0]
  ]
}

The MAP prior used in the future trial is then the mixture prior one obtains by loading the JSON file. In case the rounding to a limited number of digits caused that the weights do not exactly sum to unity, these will then automatically be rescalled accordingly:

read_mix_json(rmap_complete_new_trial_file, rescale=TRUE)

Multivariate normal mixture
Outcome dimension: 3
Mixture Components:
                         comp1  comp2  comp3  comp1 
w                         0.469  0.206  0.124  0.200
m[x[inter]]              -1.876 -1.891 -1.868  0.000
m[x[severe]]             -1.217 -1.297 -1.307  0.000
m[x[active]]              0.256 -0.441 -0.078  0.000
s[x[inter]]               0.224  0.208  0.684  3.500
s[x[severe]]              0.329  0.729  1.063  3.500
s[x[active]]              3.584  3.352  3.599  3.500
rho[x[severe],x[inter]]  -0.295 -0.085 -0.054  0.000
rho[x[active],x[inter]]  -0.082  0.090  0.046  0.000
rho[x[active],x[severe]]  0.083 -0.057 -0.065  0.000

7.4.2 MAP prior for complete and partial historical data (advanced)

In the previous Section 7.4.1 the data of the partially reported trials was dropped corresponding to a subset of 1005 patients such that 1626 were left out, which is unsatisfactory. As explained in Section 7.3.3, the data from the trials with partial data can be modelled using a latent variable approach. Their likelihood contribution for the partially observed historical trials is then based on the convolution theorem and deviates from the case of a complete data historical trial.

As strategy to model this with brms we will extend the previous model from Section 7.4.1 in two ways

include the trials with partial data such that we model their response rate by stratum in a latent manner
add the likelihood for the partially reported data

The first point can be achived by casting all data into a long format including the partially observed trials. However, for the partially observed trial data the information is missing on the number of responses per disease severity. For these trials we state in the data to have zero responders and zero patients as

\tilde{r}_{h,l} = \begin{cases} r_{h,l} & \text{complete data} \\ 0 & \text{partial data} \\ \end{cases}

\tilde{n}_{h,l} = \begin{cases} n_{h,l} & \text{complete data} \\ 0 & \text{partial data} \\ \end{cases}

We also include a column is_complete indicating if a trial is reported as complete or partial. Moreover, a column row is included which is simply the row in the data set, which will be used below.

	study	compound	trial	severe	oral	r	n	is_complete	r_total	n_total	r_tilde	n_tilde	row
1	1	Dupilumab	2b	0	0	—	32	0	1	61	0	0	1
2	1	Dupilumab	2b	1	0	—	29	0	1	61	0	0	2
3	2	Dupilumab	SOLO 1	0	0	16	113	1	23	224	16	113	3
4	2	Dupilumab	SOLO 1	1	0	7	111	1	23	224	7	111	4
5	3	Dupilumab	SOLO 2	0	0	17	121	1	20	236	17	121	5
6	3	Dupilumab	SOLO 2	1	0	3	115	1	20	236	3	115	6
7	4	Tralokinumab	ECZTRA 1	0	0	10	95	1	14	197	10	95	7
8	4	Tralokinumab	ECZTRA 1	1	0	4	102	1	14	197	4	102	8
9	5	Tralokinumab	ECZTRA 2	0	0	13	93	1	18	193	13	93	9
10	5	Tralokinumab	ECZTRA 2	1	0	5	100	1	18	193	5	100	10
11	6	Rocatinlimab	2b	0	0	—	31	0	1	57	0	0	11
12	6	Rocatinlimab	2b	1	0	—	26	0	1	57	0	0	12
13..35
36	18	Baricitinib	BREEZE-AD2	1	1	—	121	0	11	244	0	0	36

Adding the data for the partially observed trials with responders and patients set to zero causes brms to include these trials in the model and instantiate corresponding random effects for the intercept and the slope. However, since no data is effectivley provided (zero responders and zero patients), the resulting posterior is not altered as can be seen by rerunning the previouse model with the longer data including the fake data:

# Here we reuse the previous model fit and "update" it with the longer
# data. For this to work, we need to align the column names to the
# model of the complete model fit.
update(
  complete_mc,
  newdata = mutate(hist_pbo_long, r = r_tilde, n = n_tilde),
  control = list(adapt_delta = 0.95),
  seed = 645776
)

Start sampling

Running MCMC with 4 sequential chains...

Chain 1 finished in 1.1 seconds.
Chain 2 finished in 1.1 seconds.
Chain 3 finished in 1.1 seconds.
Chain 4 finished in 1.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 1.1 seconds.
Total execution time: 4.7 seconds.

 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + severe + oral + (1 + severe | study) 
   Data: mutate(hist_pbo_long, r = r_tilde, n = n_tilde) (Number of observations: 36) 

Multilevel Hyperparameters:
~study (Number of levels: 18) 
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)             0.19      0.17     0.01     0.66 1.00     1911     1767
sd(severe)                0.38      0.34     0.01     1.26 1.00     2405     2215
cor(Intercept,severe)    -0.05      0.59    -0.95     0.93 1.00     4330     2656

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.89      0.19    -2.27    -1.52 1.00     3016     2349
severe       -1.23      0.34    -1.94    -0.56 1.00     3477     2311
oral         -0.23      0.38    -1.01     0.50 1.00     3894     2264

To now include the special likelihood term for the partially reported historical trials, we use the stanvar facilities of brms. These allow to inject custom Stan code into the model such that we may add an additional likelihood for the partially observed historical trial data. As each partially observed trial only contributes a single term to the likelihood, it is appropiate to recast the previously long format into a wide data format such that one data row corresponds to just one historical trial. In order to be able to link the wide data format with the rows from the moderate and severe rows in the long format, the column row of the long format is cast into the wide format as well:

Show the code

hist_pbo_wide <- hist_pbo_long |>
  select(
    study,
    compound,
    trial,
    n_total,
    is_complete,
    r_total,
    severe,
    n,
    row
  ) |>
  pivot_wider(
    id_cols = c(
      "study",
      "compound",
      "trial",
      "r_total",
      "n_total",
      "is_complete"
    ),
    values_from = c("n", "row"),
    names_from = "severe"
  )

hist_pbo_wide |>
  gt_preview(6) |>
  fmt_number(decimals = 2) |>
  fmt_integer() |>
  sub_missing()

	study	compound	trial	r_total	n_total	is_complete	n_0	n_1	row_0	row_1
1	1	Dupilumab	2b	1	61	0	32	29	1	2
2	2	Dupilumab	SOLO 1	23	224	1	113	111	3	4
3	3	Dupilumab	SOLO 2	20	236	1	121	115	5	6
4	4	Tralokinumab	ECZTRA 1	14	197	1	95	102	7	8
5	5	Tralokinumab	ECZTRA 2	18	193	1	93	100	9	10
6	6	Rocatinlimab	2b	1	57	0	31	26	11	12
7..17
18	18	Baricitinib	BREEZE-AD2	11	244	0	123	121	35	36

All columns with a _0 postfix correspond to the moderate information available and _1 to the respective severe information.

Show the code

exact_binomial_sum <- ## first we define the probability density for a sum of two binomials
  stanvar(
    name = "binomial_sum_lpmf",
    scode = "
  // lpmf of the sum of two binomially distributed random variables
  real binomial_logit_sum_lpmf(int y, int n1, real alpha1, int n2, real alpha2) {
    // the discrete convolution here sums over all possible
    // configurations which lead to a sum of y1+y2==y
    int y1_min = max(0, y-n2);
    int y1_max = min(n1, y);
    int n_terms = y1_max - y1_min + 1;
    vector[n_terms] log_prob;
    for(i in y1_min:y1_max)
      log_prob[i - y1_min + 1] = binomial_logit_lpmf(i | n1, alpha1) + binomial_logit_lpmf(y-i | n2, alpha2);
    return log_sum_exp(log_prob);
  }
",
    block = "functions"
  ) +
  stanvar(
    name = "marginal_likelihood_sum",
    scode = "
    // likelihood on sum of Binomials (exact solution uses a discrete convolution)
    for(i in 1:n_studies) {
       if(is_complete[i] == 1)
         continue;
       target += binomial_logit_sum_lpmf(r_total[i] | n_0[i], mu[row_0[i]], n_1[i], mu[row_1[i]]);
    }
",
    block = "likelihood",
    position = "end"
  )

## needed for brms <= 2.21 and cmdstan > 2.32 to have arrays declared
## with the newer Stan syntax for arrays
stanvar_array <- function(data) {
  name <- deparse(substitute(data))
  n <- length(data)
  if (!is.integer(data)) {
    return(stanvar(data, name = name))
  }
  stanvar(data, name = name, scode = glue::glue("array[{n}] int {name};"))
}


model_marginal_data <- with(hist_pbo_wide, {
  stanvar_array(r_total) +
    stanvar_array(is_complete) +
    stanvar_array(n_0) +
    stanvar_array(n_1) +
    stanvar_array(row_0) +
    stanvar_array(row_1) +
    stanvar(length(row_1), "n_studies")
})

full_model <- bf(
  r_tilde | trials(n_tilde) ~ 1 + severe + oral + (1 + severe | study),
  family = binomial,
  center = FALSE
)

# we use the same prior as used for modelling the subset of complete
# data only
full_prior <- complete_prior

full_mc <- brm(
  full_model,
  data = hist_pbo_long,
  prior = full_prior,
  control = list(adapt_delta = 0.99),
  stanvars = model_marginal_data + exact_binomial_sum,
  seed = 458678,
  refresh = 0
)

make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.

Start sampling

Running MCMC with 4 sequential chains...

Chain 1 finished in 10.1 seconds.
Chain 2 finished in 7.7 seconds.
Chain 3 finished in 5.8 seconds.
Chain 4 finished in 6.4 seconds.

All 4 chains finished successfully.
Mean chain execution time: 7.5 seconds.
Total execution time: 30.4 seconds.

Show the code

full_mc

 Family: binomial 
  Links: mu = logit 
Formula: r_tilde | trials(n_tilde) ~ 1 + severe + oral + (1 + severe | study) 
   Data: hist_pbo_long (Number of observations: 36) 

Multilevel Hyperparameters:
~study (Number of levels: 18) 
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)             0.17      0.13     0.01     0.48 1.00     1436     2105
sd(severe)                0.36      0.29     0.01     1.07 1.00     2101     2120
cor(Intercept,severe)    -0.04      0.60    -0.96     0.96 1.00     3592     2574

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.90      0.14    -2.18    -1.64 1.00     2828     2496
severe       -1.31      0.33    -2.00    -0.71 1.00     3491     2537
oral         -0.47      0.19    -0.84    -0.07 1.00     2888     2670

To see how the Stan model has been augmented with the stanvars argument above the full Stan model code can be obtained with the stancode command:

stancode(full_mc)

Show full Stan model

// generated with brms 2.23.0
functions {
 /* compute correlated group-level effects
  * Args:
  *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns:
  *   matrix of scaled group-level effects
  */
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }
  
  // lpmf of the sum of two binomially distributed random variables
  real binomial_logit_sum_lpmf(int y, int n1, real alpha1, int n2, real alpha2) {
    // the discrete convolution here sums over all possible
    // configurations which lead to a sum of y1+y2==y
    int y1_min = max(0, y-n2);
    int y1_max = min(n1, y);
    int n_terms = y1_max - y1_min + 1;
    vector[n_terms] log_prob;
    for(i in y1_min:y1_max)
      log_prob[i - y1_min + 1] = binomial_logit_lpmf(i | n1, alpha1) + binomial_logit_lpmf(y-i | n2, alpha2);
    return log_sum_exp(log_prob);
  }

}
data {
  int<lower=1> N;  // total number of observations
  array[N] int Y;  // response variable
  array[N] int trials;  // number of trials
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  array[N] int<lower=1> J_1;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  int<lower=1> NC_1;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
  array[18] int r_total;
  array[18] int is_complete;
  array[18] int n_0;
  array[18] int n_1;
  array[18] int row_0;
  array[18] int row_1;
  int n_studies;
}
transformed data {
}
parameters {
  vector[K] b;  // regression coefficients
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
}
transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1;
  vector[N_1] r_1_2;
  // prior contributions to the log posterior
  real lprior = 0;
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[, 1];
  r_1_2 = r_1[, 2];
  lprior += normal_lpdf(b | 0, 3.5);
  lprior += normal_lpdf(sd_1[1] | 0, 3.5/4)
    - 1 * normal_lccdf(0 | 0, 3.5/4);
  lprior += normal_lpdf(sd_1[2] | 0, 2 * 3.5/4)
    - 1 * normal_lccdf(0 | 0, 2 * 3.5/4);
  lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += X * b;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
    }
    target += binomial_logit_lpmf(Y | trials, mu);
    
      // likelihood on sum of Binomials (exact solution uses a discrete convolution)
      for(i in 1:n_studies) {
         if(is_complete[i] == 1)
           continue;
         target += binomial_logit_sum_lpmf(r_total[i] | n_0[i], mu[row_0[i]], n_1[i], mu[row_1[i]]);
      }

  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(to_vector(z_1));
}
generated quantities {
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
}

Diagnostic plots for mixture multivariate normal densities are experimental.
Please note that these are subject to changes in future releases.

Multivariate normal mixture
Outcome dimension: 3
Mixture Components:
                         comp1  comp2  comp3  comp1 
w                         0.490  0.178  0.132  0.200
m[x[inter]]              -1.885 -1.920 -1.882  0.000
m[x[severe]]             -1.264 -1.329 -1.501  0.000
m[x[active]]              0.122  0.164 -0.076  0.000
s[x[inter]]               0.172  0.426  0.217  3.500
s[x[severe]]              0.324  0.625  0.991  3.500
s[x[active]]              3.534  3.349  3.724  3.500
rho[x[severe],x[inter]]  -0.302 -0.129 -0.090  0.000
rho[x[active],x[inter]]   0.029 -0.071  0.189  0.000
rho[x[active],x[severe]] -0.019  0.033  0.031  0.000

Dropping EM information from mixture object before serialization.

7.5 Results

At this stage we have derived robust bi-variate MAP priors from the subset of the complete historical control data and the full historical control data. Now we proceed and illustrate how these priors compare, what operating characeristics these imply and how these priors can be used in the analysis of the future trial data.

We first load the derived robust MAP priors which are stored as JSON files. It is important to emphasize that this constitutes the information used as a prior in the new trial. That is, the previous hierarchical model for the historical data is no longer needed to analyze the future trial given that we have accuratley approximated the (robust) MAP density in parametric form.

Nonetheles, the prior must also be made available to brms as a prior density. This requires the use of RBesT mixture priors within brms. Given that these mixture priors are not known to brms per se, an adapter function is provided by RBesT, the mixstanvar adapter. Here, we illustrate it’s use by instantiating a brms model, which is used to sample from the MAP priors to characerise it’s properties.

# load pre-specified prior
rmap_prior_protocol <- read_mix_json(rmap_new_trial_file, rescale = TRUE)

rmap_prior_complete_protocol <- read_mix_json(
  rmap_complete_new_trial_file,
  rescale = TRUE
)

# model for trial without hierarchical structure
trial_model <- bf(r ~ 1 + severe + active, family = bernoulli, center = FALSE)

# uses as prior the multi-variate normal prior previously store on
# file
trial_model_prior <- prior(mixmvnorm(rmap_w, rmap_m, rmap_sigma_L), class = b)

# insert dummy data for the two strata on control only here
study_strata <- data.frame(
  r = 0,
  severe = c(0, 1),
  n = 1,
  active = 0,
  study = "new_trial"
)

# sample prior of the model as we like to evaluate operating
# characteristics
trail_model_full_mc <- brm(
  trial_model,
  data = study_strata,
  # this is the magic sauce from RBesT which plugs
  # the multi-variate normal mixture prior into
  # the brms model
  prior = trial_model_prior,
  stanvar = mixstanvar(rmap = rmap_prior_protocol),
  sample_prior = "only",
  control = list(adapt_delta = 0.95),
  seed = 4585678,
  refresh = 0
)

make[2]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.

Start sampling

Running MCMC with 4 sequential chains...

Chain 1 finished in 0.4 seconds.
Chain 2 finished in 0.5 seconds.
Chain 3 finished in 0.7 seconds.
Chain 4 finished in 0.5 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.5 seconds.
Total execution time: 2.8 seconds.

trail_model_complete_mc <- brm(
  trial_model,
  data = study_strata,
  prior = trial_model_prior,
  stanvar = mixstanvar(rmap = rmap_prior_complete_protocol),
  sample_prior = "only",
  control = list(adapt_delta = 0.95),
  seed = 4585678,
  refresh = 0
)

Start sampling

Running MCMC with 4 sequential chains...

Chain 1 finished in 0.3 seconds.
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.5 seconds.
Chain 4 finished in 0.5 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.4 seconds.
Total execution time: 2.1 seconds.

# obtain MC sample of trial prior on response scale
trial_prior_full_rv <- rvar(posterior_linpred(
  trail_model_full_mc,
  transform = TRUE
))
trial_prior_complete_rv <- rvar(posterior_linpred(
  trail_model_complete_mc,
  transform = TRUE
))

# on the response scale we form the expected response rate which
# respects the expected percentage of severe patients.
w_severe <- 0.45
trial_prior_full_mean_rv <- (1 - w_severe) *
  trial_prior_full_rv[1] +
  w_severe * trial_prior_full_rv[2]
trial_prior_complete_mean_rv <- (1 - w_severe) *
  trial_prior_complete_rv[1] +
  w_severe * trial_prior_complete_rv[2]

trial_prior_full_mean <- mixfit(
  as_draws_matrix(trial_prior_full_mean_rv),
  type = "beta",
  Nc = 3
)
trial_prior_complete_mean <- mixfit(
  as_draws_matrix(trial_prior_complete_mean_rv),
  type = "beta",
  Nc = 3
)

summary(trial_prior_full_mean)

 mean    sd  2.5% 50.0% 97.5% 
0.152 0.202 0.032 0.094 0.962

summary(trial_prior_complete_mean)

 mean    sd  2.5% 50.0% 97.5% 
0.169 0.217 0.023 0.098 0.966

These MAP priors have been derived on the response scale and are approximated using Beta mixture priors. These correspond to the marginal prior density when assuming the quoted 45% of severe patients at baseline in the future trial. In this marginal form we can use the commonly used tools to study the trial operating characeristics when using an informative MAP prior for the control group. Note that this essentially ignores the presence of a covariate, which is a commonplace simplification for trial planning and the expectation that trial power will increase due to the explicit use of the covariate.

We start by considering the effective sample size of the priors:

ess(trial_prior_full_mean)

[1] 202

ess(trial_prior_complete_mean)

[1] 133

And continue with evaluating the operating characteristics of the trial, which assume a true response rate for placebo of 9% and use as reference case a sample size of 50 per arm (100 total), which is considered for a reduction to a 2:1 ratio trial with a total of only 75 patients:

Show the code

success_crit <- decision2S(c(0.9, 0.6), c(0, 0.35), FALSE)

unif_prior <- mixbeta(c(1, 1, 1))

design_noninf_11 <- oc2S(unif_prior, unif_prior, 50, 50, success_crit)
design_noninf_21 <- oc2S(unif_prior, unif_prior, 50, 25, success_crit)
design_complete <- oc2S(
  unif_prior,
  trial_prior_complete_mean,
  50,
  25,
  success_crit
)
design_full <- oc2S(unif_prior, trial_prior_full_mean, 50, 25, success_crit)

ggplot(data.frame(delta = c(0.1, 0.7)), aes(x = delta)) +
  stat_function(
    fun = \(x) design_full(0.09 + x, 0.09),
    aes(linetype = "Full data")
  ) +
  stat_function(
    fun = \(x) design_complete(0.09 + x, 0.09),
    aes(linetype = "Complete only data")
  ) +
  stat_function(
    fun = \(x) design_noninf_21(0.09 + x, 0.09),
    aes(linetype = "No prior 50:25")
  ) +
  stat_function(
    fun = \(x) design_noninf_11(0.09 + x, 0.09),
    aes(linetype = "No prior 50:50")
  ) +
  scale_linetype_discrete("Prior") +
  labs(title = "Power", subtitle = "Assumed placebo response rate of 9%") +
  xlab("Response rate difference active to placebo") +
  ylab("Power") +
  scale_y_continuous(breaks = seq(0, 1, by = 0.1)) +
  scale_x_continuous(breaks = seq(0, 1, by = 0.1)) +
  theme(legend.position = "bottom")

The MAP priors do improve the power to detect a difference to control here. The difference between the full and complete only prior is not quite pronounced. This can also be seen by considering the critical values tabulated by a given number of responders in the placebo group:

Show the code

crit_complete <- decision2S_boundary(
  unif_prior,
  trial_prior_complete_mean,
  50,
  25,
  success_crit
)
crit_full <- decision2S_boundary(
  unif_prior,
  trial_prior_full_mean,
  50,
  25,
  success_crit
)

tibble(r_placebo = 0:25) |>
  mutate(
    min_r_active_complete = crit_complete(0:25) + 1,
    min_r_active_full = crit_full(0:25) + 1,
    min_obs_delta_complete = if_else(
      min_r_active_complete == 0,
      NA_real_,
      min_r_active_complete / 50 - r_placebo / 25
    ),
    min_obs_delta_full = if_else(
      min_r_active_full == 0,
      NA_real_,
      min_r_active_full / 50 - r_placebo / 25
    )
  ) |>
  gt() |>
  fmt_integer(c(starts_with("min_r"), starts_with("r_"))) |>
  fmt_percent(starts_with("min_obs"), decimals = 0) |>
  tab_spanner(
    label = html("Minimal active<br>responders"),
    columns = starts_with("min_r")
  ) |>
  tab_spanner(
    label = html("Minimal observed<br>difference"),
    columns = starts_with("min_obs")
  ) |>
  cols_label(
    min_r_active_complete = html("Complete<br>only"),
    min_r_active_full = "Full",
    min_obs_delta_complete = html("Complete<br>only"),
    min_obs_delta_full = "Full",
    r_placebo = html("Responder<br>Placebo")
  ) |>
  sub_missing()

Responder Placebo	Minimal active responders		Minimal observed difference
Responder Placebo	Complete only	Full	Complete only	Full
0	22	22	44%	44%
1	23	23	42%	42%
2	24	23	40%	38%
3	24	24	36%	36%
4	24	24	32%	32%
5	25	25	30%	30%
6	27	26	30%	28%
7	31	30	34%	32%
8	34	33	36%	34%
9	36	36	36%	36%
10	38	37	36%	34%
11	39	39	34%	34%
12	41	41	34%	34%
13	43	43	34%	34%
14	45	45	34%	34%
15	47	46	34%	32%
16	49	49	34%	34%
17	0	0	—	—
18	0	0	—	—
19	0	0	—	—
20	0	0	—	—
21	0	0	—	—
22	0	0	—	—
23	0	0	—	—
24	0	0	—	—
25	0	0	—	—

7.6 Conclusion

This case study demonstrates how we may use historical control data which requires the use of a baseline covariate. This complicates the derivation of a MAP prior in two ways: (i) we are then required to derive an informative MAP prior for the intercept and the slope parameters of the respective model and (ii) the information we need from the historical data may only be given in partial form. This former complication can be resolved by a latent variable approach which we marginalize for the trials with only partial information such that we can make use of the partial data as given. This illustrates how we can use two different likelihoods within the very same brms model.

We completed the case study by illustrating the operating characteristics of a potential future trial in a simplified manner by considering the marginalized response rate. These analyses showed that there is indeed a benefit to use a MAP prior in this cotext. However, the differences between using a MAP prior based on the full and complete data only priors are not marked. Still, the ability to use the full data set without further assumptions (the distribution of the Binomial sum is calculated in an exact manner) is preferable over dismissing relevant historical control data.

7.7 Exercises

Use a normal approximation instead of the exact binomial convolution for incorporating the partially reported data. Study the differences to the case of using the exact Binomial convolution.