The dpasurv package
Matthias Kormaksson
August 27, 2024
dpa.Rmd
1. Introduction
Dynamic path analysis introduced by (Fosen et al. 2006) is a method for performing mediation analysis with a longitudinal mediator and time-to-event outcome. A thorough review of the methodology can be found in the tutorial paper of (Kormaksson et al. 2024). In this vignette we will outline the modeling framework and define direct, indirect and total effects in the context of dynamic path analysis. We then demonstrate the main functionalities of the package, namely estimation, inference, and visualization.
2. Methods
Dynamic path analysis
Dynamic path analysis involves modeling so-called dynamic path diagrams such as the one depicted in Figure 1 (a). A dynamic path diagram is defined in (Fosen et al. 2006) as a time-indexed sequence of directed acyclic graphs that is characterized by a treatment variable, , one or more longitudinal mediator variables, , and a survival (time-to-event) outcome , where denotes the time of event with hazard . The survival outcome takes the role of a single terminal node in the dynamic path diagram. Additional time-fixed confounder variables, such as baseline characteristics, , may be incorporated as well.
Relationships between variables are represented by directed edges between nodes and collectively the nodes and edges imply an underlying model structure. More specifically, each child node represents an entity to be modeled while the corresponding parent nodes represent the underlying model covariates. In Figure 1 (a) there are two implied models, one corresponding to the survival outcome variable, , and the other corresponding to the mediator response .
A defining feature of dynamic path analysis as defined in (Fosen et al. 2006) is the specific choice of models for the dynamic path diagram, namely the so-called dynamic path models. At the core of these is Aalen’s additive regression model for the survival outcome at the terminal node while all other models involve least squares regressions at each time point . In the following equations we see the dynamic path models corresponding to Figure 1 (a) In the above is the at-risk function indicating whether subject is still under observation (i.e. alive and still under follow up) just before time , and is a time-dependent error term. Under the outcome model (1) the hazard is assumed to change linearly as a function of treatment, mediator(s) and confounders, while in the mediator model (2) the longitudinal mediator is regressed on treatment and confounders.
The regression functions of principal interest are those characterizing the relationships between treatment, mediator(s) and survival, namely , and . The goal of dynamic path analysis (and the dpasurv package) is to estimate these key functions that serve as building blocks for the direct, indirect and total effects that are defined in the next section.
Direct, indirect and total effects
The cumulative direct and indirect effects at time are defined as as follows The direct effect corresponds to the regression function along the direct arrow between treatment and outcome in Figure 1 (a), while the indirect effect is obtained by multiplying the regression functions along the mediation path .
The total effect is defined as the regression function from the mediator free Aalen’s additive model from Figure 1 (b) and the corresponding cumulative total effect is defined as: One of the most appealing features of dynamic path analysis is the following analytical decomposition
Estimation and inference
The analytical estimators for the cumulative direct, indirect, and total effects above are derived in (Kormaksson et al. 2024) and confidence bands are obtained by bootstrap inference. More specifically, we repeatedly sample subjects with replacement from the set of all subjects and estimate the direct, indirect and total effects each time. Once we have obtained bootstrap estimates of the effects of interest, i.e. ), for , then we calculate pointwise % bootstrap confidence intervals at each timepoint by calculating the corresponding empirical and percentiles.
3. Getting started with dpasurv
The dpasurv package implements the complete dynamic path analysis workflow.
Simulated data set
The dpasurv package comes with a simulated data set called
simdata
simdata
#> # A tibble: 1,229 × 7
#> subject x dose M start stop event
#> <fct> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 ctrl 6.74 0 4 0
#> 2 1 0 ctrl 6.91 4 8 0
#> 3 1 0 ctrl 6.9 8 12 0
#> 4 1 0 ctrl 6.71 12 26 0
#> 5 1 0 ctrl 6.45 26 46.8 1
#> 6 2 1 high 6.11 0 4 0
#> 7 2 1 high 6.28 4 8 0
#> 8 2 1 high 7.04 8 12 0
#> 9 2 1 high 6.93 12 26 0
#> 10 2 1 high 7.86 26 52 0
#> # ℹ 1,219 more rows
The data contains the following variables:
- subject: Subject ID
- x: Binary treatment covariate (0 for control, 1 for treated)
- dose: Underlying dose (ctrl, low, high)
- M: Longitudinal mediator measurement
- start: Visiting times at which the mediator value is measured
- stop: Time that either marks the beginning of the next visit, or censoring (if event=0). Alternatively, it marks the time of event (if event=1)
- event: Binary event indicator
At each visiting time, start, a new mediator measurement is taken. However, the mathematical assumption is that the intervals (start,stop] are open on the left and closed on the right. Under this assumption the risk of an event in the interval (start, stop] is influenced by the last observed mediator value.
Implementation in a nutshell
Below we see the minimal code required for performing dynamic path
analysis on simdata
. It explores the direct effect of
treatment x on survival as well as the indirect effect mediated through
M:
# Perform dynamic path analysis
s <- dpa(Surv(start, stop, event) ~ M + x, list(M ~ x), id = "subject", data = simdata, boot.n = 500)
# Extract cumulative direct and indirect effects with 95% pointwise bootstrap confidence bands:
direct <- effect(x ~ outcome, s, alpha=0.05)
indirect <- effect(x ~ M ~ outcome, s, alpha=0.05)
# Use sum() method to obtain total effect
total <- sum(direct, indirect)
# Plot the results
layout1x3 <- par(mfrow=c(1,3))
plot(direct); abline(h=0, lty=2, col=2)
plot(indirect); abline(h=0, lty=2, col=2)
plot(total); abline(h=0, lty=2, col=2)
# restore user's graphical parameters:
par(layout1x3)
# Plot the results with ggplot2 graphics:
ggplot.effect(list(direct, indirect, total))
4. Simulated Use Case
Let us now use the dpasurv package to perform a slightly more in
depth analysis of simdata
.
Look at the data
Firstly, we note that there is a dose response trend when looking at longitudinal trends in the subject mediator profiles:
ggplot(mapping=aes(x = start, y = M, group=dose), data=simdata) + geom_smooth(aes(colour=dose))
The higher the dose, the higher the average mediator value is.
We also note that there is a dose response trend when looking at association between dose levels and survival
fit <- survival::survfit(Surv(start, stop, event) ~ dose, data=simdata)
plot(fit, col=2:4, xlab="Time", ylab="Survival", main="KM-curves by dose level")
legend(150, 1, c("control", "low dose", "high dose"), col=2:4, lwd=2, bty='n')
A Cox model with dose and mediator as covariates reveals a significant hazard ratio for high dose, but only a trend for the low dose (both as compared to control), while there is no significant association between the mediator and survival:
summary(survival::coxph(Surv(start, stop, event) ~ dose + M, data=simdata))
#> Call:
#> survival::coxph(formula = Surv(start, stop, event) ~ dose + M,
#> data = simdata)
#>
#> n= 1229, number of events= 172
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> doselow -0.32281 0.72411 0.18826 -1.715 0.08640 .
#> dosehigh -0.66230 0.51566 0.20542 -3.224 0.00126 **
#> M 0.05274 1.05416 0.04863 1.085 0.27808
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> doselow 0.7241 1.3810 0.5007 1.0473
#> dosehigh 0.5157 1.9392 0.3448 0.7713
#> M 1.0542 0.9486 0.9583 1.1596
#>
#> Concordance= 0.555 (se = 0.025 )
#> Likelihood ratio test= 10.57 on 3 df, p=0.01
#> Wald test = 10.51 on 3 df, p=0.01
#> Score (logrank) test = 10.7 on 3 df, p=0.01
Dynamic path analysis
Let’s now perform dynamic path analysis with the dpa()
function
set.seed(1)
s <- dpa(Surv(start, stop, event) ~ M + dose, list(M ~ dose), id = "subject", data = simdata, boot.n = 500)
and calculate some effects of interest with the effect()
function.
We confirm a dose response on mediator, which appears significant for the high-dose vs control:
# The dose effect on mediator response:
dose.on.mediator <- effect(dose ~ M, s)
ggplot.effect(dose.on.mediator)
We also confirm a dose response on survival outcome, although only the high-dose appears significant:
# direct effect of dose on outcome
direct <- effect(dose ~ outcome, s)
ggplot.effect(direct)
In contrast to the conclusions from the Cox-model there appears to be a meaningful dynamic association between the mediator and survival:
# effect of mediator on outcome
mediator.on.outcome <- effect(M ~ outcome, s)
ggplot.effect(mediator.on.outcome)
In particular, the mediator effect is neutral for the first 100 days and then starts having a significant detrimental effect on survival after that (the higher the mediator value, the higher the risk).
Explore output of Aalen’s additive model
The output object from the call to the dpa()
function
above stores the output from Aalen’s additive model (1) and we can
explore its content for inferential purposes
# Output from Aalen's additive model from the timereg::aalen() implementation:
summary(s$aalen)
#> Additive Aalen Model
#>
#> Test for nonparametric terms
#>
#> Test for non-significant effects
#> Supremum-test of significance p-value H_0: B(t)=0
#> (Intercept) 2.56 0.163
#> M 3.30 0.025
#> doselow 2.99 0.061
#> dosehigh 5.12 0.000
#>
#> Test for time invariant effects
#> Kolmogorov-Smirnov test p-value H_0:constant effect
#> (Intercept) 1.400 0.023
#> M 0.176 0.066
#> doselow 0.595 0.473
#> dosehigh 0.587 0.534
#> Cramer von Mises test p-value H_0:constant effect
#> (Intercept) 148.00 0.008
#> M 2.05 0.072
#> doselow 14.90 0.530
#> dosehigh 13.30 0.623
#>
#>
#>
#> Call:
#> timereg::aalen(formula = out.formula, data = data, id = data[[id]])
According to the resampling tests of the time-varying effects we see that the mediator effect is indeed significantly different from zero as well as the high-dose effect, while the low-dose effect is only marginally significant.
Plot the direct, indirect, and total effects
Finally, let’s also calculate the indirect and total effects
# indirect effect of dose on outcome, mediated through M
indirect <- effect(dose ~ M ~ outcome, s)
# total effect
total <- sum(direct, indirect)
and then plot together the direct, indirect, and total effects:
ggplot.effect(list(direct, indirect, total))
We see that the indirect effect of dose on survival is neutral for the first 100 days but then becomes detrimental for survival after that (only significant for the high dose). The detrimental indirect effect of the high dose counteracts the corresponding significant and beneficial direct effect.