This document contains exploratory plots for binary response data as well as the R code that generates these graphs. The plots presented here are based on simulated data (see: PKPD Datasets). Data specifications can be accessed on Datasets and Rmarkdown template to generate this page can be found on Rmarkdown-Template. You may also download the Multiple Ascending Dose PK/PD dataset for your reference (download dataset).
Binary data is data that can take on one of two values. This often happens when there is a characteristic/event/response, etc. of interest that subjects either have/achieve or they don’t. Binary response data can also come out of dichotomizing continuous data. For example in the psoriasis indication the binary response variable PASI90 (yes/no) is defined as subjects achieving a PASI score of at least 90.
There are two broad categories of PD exploratory plots covered on this page
These plots are displayed below.
# remove reference to home directory in libPaths
.libPaths(grep("home", .libPaths(), value=TRUE, invert=TRUE))
.libPaths(grep("usr", .libPaths(), value=TRUE, invert=TRUE))
# add localLib to libPaths for locally installed packages
.libPaths(c("localLib", .libPaths()))
# will load from first filepath first, then look in .libPaths for more packages not in first path
# version matches package in first filepath, in the case of multiple instances of a package
# library(rmarkdown)
library(gridExtra)
library(grid)
library(ggplot2)
library(dplyr)
library(RxODE)
library(caTools)
#flag for labeling figures as draft
draft.flag = TRUE
## ggplot settings
theme_set(theme_bw(base_size=12))
# annotation of plots with status of code
AnnotateStatus <- function(draft.flag, log.y=FALSE, log.x=FALSE, fontsize=7, color="grey") {
x.pos <- -Inf
if (log.x)
x.pos <- 0
y.pos <- -Inf
if (log.y)
y.pos <- 0
if(draft.flag) {
annotateStatus <- annotate("text",
label="DRAFT",
x=x.pos, y=y.pos,
hjust=-0.1, vjust=-1.0,
cex=fontsize,
col=color, alpha=0.7, fontface="bold")
} else {
annotateStatus <- NULL
}
return(annotateStatus)
}
The plots presented here are based on simulated data (see: PKPD Datasets). You may also download the Multiple Ascending Dose PK/PD dataset for your reference (download dataset).
my.data <- read.csv("../Data/Multiple_Ascending_Dose_Dataset2.csv")
# Define order for factors
my.data$TRTACT <- factor(my.data$TRTACT, levels = unique(my.data$TRTACT[order(my.data$DOSE)]))
We start with Response vs Time, or longitduinal plots
Binary data is often summarized by displaying the percent of subjects in the “yes” category, e.g. the percent of “responders,” along with 95% confidence intervals. For generating confidence intervals, it is useful to bin data points by nominal time; confidence intervals should be calculated using the binom::binom.exact()
function.
Questions to ask when looking at these plots
Warning: Even if you don’t see a plateau in response, that doesn’t mean there isn’t one. Be very careful about using linear models for Dose-Response relationships. Extrapolation outside of the observed dose range could indicate a higher dose is always better (even if it isn’t).
When overlaying individual binomial data, it is often helpful to stagger the dots and use transparency, so that it is easier to see individual data points. If you do so, this should be clearly stated.
data_to_plot <- my.data[my.data$CMT==6,]
data_to_plot$TRTACT <- factor(data_to_plot$TRTACT, levels = rev(levels(data_to_plot$TRTACT)))
gg <- ggplot(data = data_to_plot,
aes(x=NOMTIME/24,y=LIDV, color = TRTACT, fill = TRTACT))+theme_bw()
gg <- gg + stat_summary(geom="errorbar",
fun.data = function(y){
data.frame(ymin = binom::binom.exact(sum(y), length(y),
conf.level = 0.95)$lower,
ymax = binom::binom.exact(sum(y), length(y),
conf.level = 0.95)$upper)
}, size = 1, width = 0.2)
gg <- gg + stat_summary(geom="point", fun.y=mean, size = 2)
gg <- gg + stat_summary(aes(group = TRTACT), geom="line",fun.y=mean, size = 1)
gg <- gg + guides(color=guide_legend(""),fill=guide_legend(""))
gg <- gg + xlab("Time (days)") + scale_x_continuous(breaks = seq(-1,9,1))
gg <- gg + ylab("Responder Rate (%)") + scale_y_continuous(labels=scales::percent)
gg
data_to_plot <- my.data[my.data$CMT==6,]
set.seed(12345)
data_to_plot$Response2 <- jitter(data_to_plot$LIDV, amount=0.1)
data_to_plot$DAY2 <- jitter(data_to_plot$PROFDAY, amount=0.1)
gg <- ggplot(data = data_to_plot,
aes(x=PROFDAY,y=LIDV))+theme_bw()
gg <- gg + geom_point(aes(x=DAY2,y=Response2), size=2, alpha = 0.3)
# gg <- gg + geom_line( aes(x=DAY2,y=Response2, group = ID), alpha = 0.3)
gg <- gg + stat_summary(aes(x=PROFDAY, y=LIDV), geom="line", fun.y=mean)
gg <- gg + stat_summary(aes(x=PROFDAY,y=LIDV),geom="ribbon",
fun.data = function(y){
data.frame(ymin = binom::binom.exact(sum(y), length(y),
conf.level = 0.95)$lower,
ymax = binom::binom.exact(sum(y), length(y),
conf.level = 0.95)$upper)
},alpha = 0.3, size = 1)
# gg <- gg + geom_smooth(aes(x=PROFDAY, y=LIDV))
gg <- gg + guides(color=guide_legend(""),fill=guide_legend(""))
gg <- gg + xlab("Time (days)") + scale_x_continuous(breaks = seq(-1,8,1))
gg <- gg + facet_grid(~TRTACT)
gg <- gg + ylab("Response") + scale_y_continuous(breaks=c(0,1)) +
coord_cartesian(ylim=c(-0.2,1.2))
gg
Use spaghetti plots to visualize the extent of variability between individuals. The wider the spread of the profiles, the higher the between subject variability. Distinguish different doses by color, or separate into different panels.
Binary spaghetti plots may not be as informative as spaghetti plots for continuous data, however you can pick out the transition states from 0 to 1 and back. By looking at the transition states, ask yourself these questions: For each dose group, at what time have most of the subjects “responded”? How wide is the spread of “response time” across subjects (do all of them respond at the exact same time, or over a range of time?) Do you detect any dampening, or transitioning of response back to no response?
data_to_plot <- my.data[my.data$CMT==6,]
set.seed(12345)
data_to_plot$Response2 <- jitter(data_to_plot$LIDV, amount=0.1)
data_to_plot$DAY2 <- jitter(data_to_plot$PROFDAY, amount=0.1)
gg <- ggplot(data = data_to_plot,
aes(x=DAY2,y=Response2))+theme_bw()
gg <- gg + geom_point( size=2, alpha = 0.3)
gg <- gg + geom_line( aes(group = ID), alpha = 0.3)
gg <- gg + guides(color=guide_legend(""),fill=guide_legend(""))
gg <- gg + xlab("Time (days)") + scale_x_continuous(breaks = seq(-1,8,1))
gg <- gg + facet_grid(~TRTACT)
gg <- gg + ylab("Response") + scale_y_continuous(breaks=c(0,1)) +
coord_cartesian(ylim=c(-0.2,1.2))
gg
Plot individual profiles in order to inspect them for any unusual profiles. Look at the pattern of response/non-response over time and whether individuals switch from responder back to non-responder.
data_to_plot <- my.data[my.data$CMT==6,]
gg <- ggplot(data = data_to_plot,
aes(x=PROFDAY,y=LIDV))+theme_bw()
gg <- gg + geom_point( size=2) + geom_line( aes(group = ID))
gg <- gg + guides(color=guide_legend(""),fill=guide_legend(""))
gg <- gg + xlab("Time (days)")+ scale_x_continuous(breaks = seq(0,max(data_to_plot$PROFDAY)+1,7))
gg <- gg + facet_wrap(~ID+TRTACT,ncol = length(unique(data_to_plot$ID))/length(unique(data_to_plot$DOSE)) )
gg <- gg + ylab("Response")
gg
Stratify by covariates of interest to explore whether any key covariates impact response. For examples of plots and code startifying by covariate, see Single Ascending Dose Covariate Section
Warning Be careful of interpreting covariate effects on PD. Covariate effects on PD could be the result of covariate effects on PK transfering to PD through the PK/PD relationship.
One of the key questions when looking at PD markers is to determine if there is a dose-response relationship, and if there is, what dose is necessary to achieve the desired effect? Simple dose-response plots can give insight into these questions.
Both linear scale and log scale may be used for dose. Linear scale helps in seeing a plateau at large doses. Log-scale often helps to see all dose groups. If the plot on log-scale looks linear, a log-linear dose-response model could be fit, i.e. Response=Elog*log(1+C).
Questions to consider
Warning: Even if you don’t see an Emax, that doesn’t mean there isn’t one. Be very careful about using linear models for Dose-Response relationships. Extrapolation outside of the observed dose range could indicate a higher dose is always better (even if it isn’t).
data_to_plot <- my.data[my.data$CMT==6,]
data_to_plot$DAY_label <- paste("Day", data_to_plot$PROFDAY)
data_to_plot$DAY_label[data_to_plot$DAY_label=="Day 0"] = "Baseline"
data_to_plot <- data_to_plot[data_to_plot$DAY_label%in%c("Day 5"),]
gg <- ggplot(data = data_to_plot, aes(x=DOSE,y=LIDV))+theme_bw()
gg <- gg + stat_summary(geom="errorbar",
fun.data = function(y){
data.frame(ymin = binom::binom.exact(sum(y), length(y),
conf.level = 0.95)$lower,
ymax = binom::binom.exact(sum(y), length(y),
conf.level = 0.95)$upper)
}, alpha = 0.5, size = 1, width= 0.2)
gg <- gg + stat_summary(geom="point", fun.y=mean, shape = 0)
gg <- gg + guides(color=guide_legend(""),fill=guide_legend(""))
gg <- gg + scale_y_continuous(labels=scales::percent) + ylab("Responder Rate (%)")
gg <- gg + xlab("Dose (mg)")
gg <- gg + geom_smooth( method = "glm",
method.args=list(family=binomial(link = logit)), color = "black")
pp1 <- gg + ggtitle("Dose on Linear Scale")
## Same plot but on a log scale
pp2 <- gg + scale_x_log10(breaks=unique(data_to_plot$DOSE)) + ggtitle("Dose on Log Scale")
grid.arrange(pp1,pp2,nrow=1)
The crossectional Dose-Response curve which looks only at one timepoint defined in the protocol can obscure certain characteristics of the dose-response relationship. For example, if the response variable is much delayed compared to PK the maximal PD effect could occur much later than steady state PK is achieved. Looking only at the defined clinical endpoint has the potential miss this, especially in early clinical trials before the time course of the effect has been characterized. In addtion to looking at longitudinal PD over time (as in previous sections above), plotting the cross-sectional Dose-Response curves for different time points throughout the study can also be helpful. In the figure below, we plot the exposure response relationship at a number of different visits to see how the relationsihp might change over time.
data_to_plot <- my.data[my.data$CMT==6,]
data_to_plot$DAY_label <- paste("Day", data_to_plot$PROFDAY)
data_to_plot$DAY_label[data_to_plot$DAY_label=="Day 0"] = "Baseline"
data_to_plot <- data_to_plot[data_to_plot$DAY_label%in%c("Day 1","Day 3","Day 5"),]
data_to_plot$TRTACT <- factor(data_to_plot$TRTACT, levels = rev(levels(data_to_plot$TRTACT)))
gg <- ggplot(data = data_to_plot,
aes(x=DOSE,y=LIDV))+theme_bw()
gg <- gg + stat_summary(geom="errorbar",
fun.data = function(y){
data.frame(ymin = binom::binom.exact(sum(y), length(y),
conf.level = 0.95)$lower,
ymax = binom::binom.exact(sum(y), length(y),
conf.level = 0.95)$upper)
}, size = 1, width = 0.2)
gg <- gg + stat_summary(geom="point", fun.y=mean, size = 2, shape = 0)
gg <- gg + stat_summary(aes(group = TRTACT), geom="line",fun.y=mean, size = 1)
gg <- gg + geom_smooth( method = "glm",
method.args=list(family=binomial(link = logit)), color = "black")
gg <- gg + guides(color=guide_legend(""),fill=guide_legend(""))
gg <- gg + xlab("Dose (mg)")
gg <- gg + ylab("Responder Rate (%)") + scale_y_continuous(labels=scales::percent)
gg + facet_grid(~DAY_label)
sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Red Hat Enterprise Linux Server 7.4 (Maipo)
##
## Matrix products: default
## BLAS/LAPACK: /CHBS/apps/intel/17.4.196/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DT_0.2 RxODE_0.6-1 bindrcpp_0.2 haven_1.1.0
## [5] readr_1.1.1 readxl_1.0.0 xtable_1.8-2 tidyr_0.7.2
## [9] caTools_1.17.1 zoo_1.8-0 dplyr_0.7.4 ggplot2_2.2.1
## [13] gridExtra_2.3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.14 cellranger_1.1.0 compiler_3.4.3 pillar_1.0.1
## [5] plyr_1.8.4 bindr_0.1 forcats_0.2.0 bitops_1.0-6
## [9] tools_3.4.3 digest_0.6.13 jsonlite_1.5 memoise_1.1.0
## [13] evaluate_0.10.1 tibble_1.4.1 gtable_0.2.0 lattice_0.20-35
## [17] pkgconfig_2.0.1 rlang_0.1.6 rex_1.1.2 yaml_2.1.16
## [21] stringr_1.2.0 knitr_1.18 hms_0.4.0 htmlwidgets_0.9
## [25] rprojroot_1.3-1 glue_1.2.0 R6_2.2.2 binom_1.1-1
## [29] rmarkdown_1.8 reshape2_1.4.3 purrr_0.2.4 magrittr_1.5
## [33] codetools_0.2-15 backports_1.1.2 scales_0.5.0 htmltools_0.3.6
## [37] rsconnect_0.8.5 assertthat_0.2.0 colorspace_1.3-2 labeling_0.3
## [41] stringi_1.1.3 lazyeval_0.2.1 munsell_0.4.3 markdown_0.8