torchsurv.metrics.auc#

Classes

Auc([checks, tied_tol])

Area Under the Curve class for survival models.

class torchsurv.metrics.auc.Auc(checks: bool = True, tied_tol: float = 1e-08)[source]#

Area Under the Curve class for survival models.

__init__(checks: bool = True, tied_tol: float = 1e-08)[source]#

Initialize an Auc for survival class model evaluation.

Parameters:

tied_tol (float) – Tolerance for tied risk scores. Defaults to 1e-8.
checks (bool) – Whether to perform input format checks. Enabling checks can help catch potential issues in the input data. Defaults to True.

Examples

>>> _ = torch.manual_seed(42)
>>> n = 10
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> estimate = torch.randn((n,))
>>> auc = Auc()
>>> auc(estimate, event, time) # default: auc cumulative/dynamic
tensor([0.7500, 0.4286, 0.3333])
>>> auc.integral()
tensor(0.5040)
>>> auc.confidence_interval() # default: Blanche, two_sided
tensor([[0.4213, 0.0000, 0.0000],
        [1.0000, 0.9358, 0.7289]])
>>> auc.p_value()
tensor([0.1360, 0.7826, 0.4089])

__call__(estimate: Tensor, event: Tensor, time: Tensor, auc_type: str = 'cumulative', weight: Tensor | None = None, new_time: Tensor | None = None, weight_new_time: Tensor | None = None, instate: bool = True) → Tensor[source]#

Compute the time-dependent Area Under the Receiver Operating Characteristic Curve (AUC).

The AUC at time $t$ is the probability that a model correctly predicts which of two comparable samples will experience an event by time $t$ based on their estimated risk scores. The AUC is particularly useful for evaluating time-dependent predictions (e.g., 10-year mortality). It is recommended to use AUC instead of the concordance index for such time-dependent predictions, as AUC is proper in this context, while the concordance index is not [BKG18].

Parameters:

estimate (torch.Tensor) – Estimated risk of event occurrence (i.e., risk score). Can be of shape = (n_samples,) if subject-specific risk score is time-independent, of shape = (n_samples, n_samples) if subject-specific risk score is evaluated at time, or of shape = (n_samples, n_times) if subject-specific risk score is evaluated at new_time.
event (torch.Tensor, boolean) – Event indicator of size n_samples (= True if event occurred).
time (torch.Tensor, float) – Time-to-event or censoring of size n_samples.
auc_type (str, optional) – AUC type. Defaults to “cumulative”. Must be one of the following: “cumulative” for cumulative/dynamic, “incident” for incident/dynamic.
weight (torch.Tensor, optional) – Optional sample weight evaluated at time of size n_samples. Defaults to 1.
new_time (torch.Tensor, optional) – Time points at which to evaluate the AUC of size n_times. Values must be within the range of follow-up in time. Defaults to the event times excluding maximum (because number of controls for t > max(time) is 0).
weight_new_time (torch.Tensor) – Optional sample weight evaluated at new_time of size n_times. Defaults to 1.

Returns:

AUC evaluated at new_time.

Return type:

torch.Tensor

Note

The function evaluates either the cumulative/dynamic (C/D) or the incident/dynamic (I/D) AUC (argument auc_type) at time $t \in \{t_1, \cdots, t_K\}$ (argument new_time).

For each subject $i \in \{1, \cdots, N\}$, denote $X_i$ as the survival time and $D_i$ as the censoring time. Survival data consist of the event indicator, $\delta_i=(X_i\leq D_i)$ (argument event) and the time-to-event or censoring, $T_i = \min(\{ X_i,D_i \})$ (argument time).

The risk score measures the risk (or a proxy thereof) that a subject has an event. The function accepts time-dependent risk score or time-independent risk score. The time-dependent risk score of subject $i$ is specified through a function $q_i: [0, \infty) \rightarrow \mathbb{R}$. The time-independent risk score of subject $i$ is specified by a constant $q_i$. The argument estimate is the estimated risk score. For time-dependent risk score: if new_time is specified, the argument estimate should be of shape = (N,K) ($(i,k)$ th element is $\hat{q}_i(t_k)$); if new_time is not specified, the argument estimate should be of shape = (N,N) ($(i,j)$ th element is $\hat{q}_i(T_j)$) . For time-independent risk score, the argument estimate should be of length N ($i$ th element is $\hat{q}_i$).

The AUC C/D and AUC I/D evaluated at time $t$ are defined by

\[\begin{split}\text{AUC}^{C/D}(t) = p(q_i(t) > q_j(t) \: | \: X_i \leq t, X_j > t) \\ \text{AUC}^{I/D}(t) = p(q_i(t) > q_j(t) \: | \: X_i = t, X_j > t).\end{split}\]

The default estimators of the AUC C/D and AUC I/D at time $t$ [BDJacqminGadda13] returned by the function are

\[\begin{split}\hat{\text{AUC}}^{C/D}(t) = \frac{\sum_i \sum_j \delta_i \: I(T_i \leq t, T_j > t) I(\hat{q}_i(t) > \hat{q}_j(t))}{\sum_i \delta_i \: I(T_i \leq t) \sum_j I(T_j > t)} \\ \hat{\text{AUC}}^{I/D}(t) = \frac{\sum_i \sum_j \delta_i \: I(T_i = t, T_j > t) I(\hat{q}_i(t) > \hat{q}_j(t))}{\sum_i \delta_i \: I(T_i = t) \sum_j I(T_j > t)}.\end{split}\]

These estimators are considered naive because, when the event times are censored, all subjects censored before time point $t$ are ignored. Additionally, the naive estimators converge to an association measure that involves the censoring distribution. To address this shortcoming, Uno et al. [UCTW07] proposed to employ the inverse probability weighting technique. In this context, each subject included at time $t$ is weighted by the inverse probability of censoring $\omega(t) = 1 / \hat{D}(t)$, where $\hat{D}(t)$ is the Kaplan-Meier estimate of the censoring distribution, $P(D>t)$. The censoring-adjusted AUC C/D estimate at time $t$ is

\[\begin{split}\hat{\text{AUC}}^{C/D}(t) = \frac{\sum_i \sum_j \delta_i \: \omega(T_i) \: I(T_i \leq t, T_j > t) I(\hat{q}_i(t) > \hat{q}_j(t))}{\sum_i \delta_i \: \omega(T_i) \: I(T_i \leq t) \sum_j I(T_j > t)} \\\end{split}\]

Note that the censoring-adjusted AUC I/D estimate is the same as the “naive” estimate because the weights are all equal to $\omega(t)$.

The censoring-adjusted AUC C/D estimate can be obtained by specifying the argument weight, the weights evaluated at each time ($\omega(T_1), \cdots, \omega(T_N)$). If new_time is specified, the argument weight_new_time should also be specified accordingly, the weights evaluated at each new_time ($\omega(t_1), \cdots, \omega(t_K)$). The latter is required to compute the standard error of the AUC. In the context of train/test split, the weights should be derived from the censoring distribution estimated in the training data. Specifically, the censoring distribution is estimated using the training set and then evaluated at the subject time within the test set.

Examples

>>> from torchsurv.stats.ipcw import get_ipcw
>>> _ = torch.manual_seed(42)
>>> n = 20
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> estimate = torch.randn((n,))
>>> auc = Auc()
>>> auc(estimate, event, time) # default: naive auc c/d
tensor([0.9474, 0.5556, 0.5294, 0.6429, 0.5846, 0.6389, 0.5844, 0.5139, 0.4028,
        0.5400, 0.4545, 0.7500])
>>> auc(estimate, event, time, auc_type = "incident") # naive auc i/d
tensor([0.9474, 0.1667, 0.4706, 0.9286, 0.3846, 0.8333, 0.3636, 0.2222, 0.0000,
        0.8000, 0.5000, 1.0000])
>>> ipcw = get_ipcw(event, time) # ipcw weight at time
>>> auc(estimate, event, time, weight = ipcw) # Uno's auc c/d
tensor([0.9474, 0.5556, 0.5294, 0.6521, 0.5881, 0.6441, 0.5865, 0.5099, 0.3929,
        0.5422, 0.4534, 0.7996])
>>> new_time = torch.unique(torch.randint(low=100, high=150, size=(n,)).float()) # new time at which to evaluate auc
>>> ipcw_new_time = get_ipcw(event, time, new_time) # ipcw at new_time
>>> auc(estimate, event, time, new_time = new_time, weight = ipcw, weight_new_time = ipcw_new_time) # Uno's auc c/d at new_time
tensor([0.5333, 0.5333, 0.5333, 0.5333, 0.6521, 0.6521, 0.5881, 0.5881, 0.5865,
        0.5865, 0.5865, 0.5865, 0.5865, 0.6018, 0.6018, 0.5099])

References

[BDJacqminGadda13]

Paul Blanche, Jean‐François Dartigues, and Hélène Jacqmin‐Gadda. Review and comparison of roc curve estimators for a time‐dependent outcome with marker‐dependent censoring. Biometrical Journal, 55(5):687–704, June 2013.

[BKG18]

Paul Blanche, Michael W Kattan, and Thomas A Gerds. The c-index is not proper for the evaluation of $t$-year predicted risks. Biostatistics, 20(2):347–357, February 2018.

[UCTW07]

Hajime Uno, Tianxi Cai, Lu Tian, and L. J Wei. Evaluating prediction rules fort-year survivors with censored regression models. Journal of the American Statistical Association, 102(478):527–537, June 2007.

integral(tmax: Tensor | None = None)[source]#

Compute the integral of the time-dependent AUC.

Parameters:: tmax (torch.Tensor, optional) – A number specifying the upper limit of the time range to compute the AUC integral. Defaults to new_time[-1] for cumulative/dynamic AUC and new_time[-1]-1 for incident/dynamic AUC.
Returns:: Integral of AUC over [0-tmax].
Return type:: torch.Tensor

Examples

>>> _ = torch.manual_seed(42)
>>> n = 10
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> estimate = torch.randn((n,))
>>> auc = Auc()
>>> auc(estimate, event, time, auc_type = "incident")
tensor([0.7500, 0.1429, 0.1667])
>>> auc.integral() # integral of the auc incident/dynamic
tensor(0.4667)

Notes

In case auc_type = “cumulative” (cumulative/dynamic AUC), the values of AUC are weighted by the estimated event density. In case auc_type = “incident” (incident/dynamic AUC), the values of AUC are weighted by 2 times the product of the estimated event density and the estimated survival function [HZ05]. The estimated survival function is the Kaplan-Meier estimate. The estimated event density is obtained from the discrete incremental changes of the estimated survival function.

References

[HZ05]

Patrick J. Heagerty and Yingye Zheng. Survival model predictive accuracy and roc curves. Biometrics, 61(1):92–105, February 2005.

confidence_interval(method: str = 'blanche', alpha: float = 0.05, alternative: str = 'two_sided', n_bootstraps: int = 999) → Tensor[source]#

Compute the confidence interval of the AUC.

This function calculates either the pointwise confidence interval or the bootstrap confidence interval for the AUC. The pointwise confidence interval is computed assuming that the AUC is normally distributed and using the standard error estimated with Blanche et al. [BDJacqminGadda13] method. The bootstrap confidence interval is constructed based on the distribution of bootstrap samples.

Parameters:

method (str) – Method for computing confidence interval. Defaults to “blanche”. Must be one of the following: “blanche”, “bootstrap”.
alpha (float) – Significance level. Defaults to 0.05.
alternative (str) – Alternative hypothesis. Defaults to “two_sided”. Must be one of the following: “two_sided”, “greater”, “less”.
n_bootstraps (int) – Number of bootstrap samples. Defaults to 999. Ignored if method is not “bootstrap”.

Returns:

Lower and upper bounds of the confidence interval.

Return type:

torch.Tensor([lower,upper])

Examples

>>> _ = torch.manual_seed(42)
>>> n = 10
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> estimate = torch.randn((n,))
>>> auc = Auc()
>>> auc(estimate, event, time)
tensor([0.7500, 0.4286, 0.3333])
>>> auc.confidence_interval() # Default: Blanche, two_sided
tensor([[0.4213, 0.0000, 0.0000],
        [1.0000, 0.9358, 0.7289]])
>>> auc.confidence_interval(method = "bootstrap", alternative = "greater")
tensor([[0.3750, 0.1667, 0.0833],
        [1.0000, 1.0000, 1.0000]])

References

[BDJacqminGadda13] (1,2,3)

Paul Blanche, Jean‐François Dartigues, and Hélène Jacqmin‐Gadda. Estimating and comparing time‐dependent areas under receiver operating characteristic curves for censored event times with competing risks. Statistics in Medicine, 32(30):5381–5397, September 2013.

p_value(method: str = 'blanche', alternative: str = 'two_sided', n_bootstraps: int = 999) → Tensor[source]#

Perform a one-sample hypothesis test on the AUC.

This function calculates either the pointwise p-value or the bootstrap p-value for testing the null hypothesis that the AUC is equal to 0.5. The pointwise p-value is computed assuming that the AUC is normally distributed and using the standard error estimated using Blanche et al. [BDJacqminGadda13] method. The bootstrap p-value is derived by permuting risk predictions to estimate the sampling distribution under the null hypothesis.

Parameters:

method (str) – Method for computing p-value. Defaults to “blanche”. Must be one of the following: “blanche”, “bootstrap”.
alternative (str) – Alternative hypothesis. Defaults to “two_sided”. Must be one of the following: “two_sided” (AUC is not equal to 0.5), “greater” (AUC is greater than 0.5), “less” (AUC is less than 0.5).
n_bootstraps (int) – Number of bootstrap samples. Defaults to 999. Ignored if method is not “bootstrap”.

Returns:

p-value of the statistical test.

Return type:

torch.Tensor

Examples

>>> _ = torch.manual_seed(42)
>>> n = 10
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> estimate = torch.randn((n,))
>>> auc = Auc()
>>> auc(estimate, event, time)
tensor([0.7500, 0.4286, 0.3333])
>>> auc.p_value() # Default: Blanche, two_sided
tensor([0.1360, 0.7826, 0.4089])
>>> auc.p_value(method = "bootstrap", alternative = "greater")
tensor([0.2400, 0.5800, 0.7380])

compare(other, method: str = 'blanche', n_bootstraps: int = 999) → Tensor[source]#

Compare two AUCs.

This function compares two AUCs computed on the same data with different risk predictions. The statistical hypotheses are formulated as follows, null hypothesis: auc1 = auc2 and alternative hypothesis: auc1 > auc2. The statistical test is either a Student t-test for dependent samples or a two-sample bootstrap test. The Student t-test assumes that the AUC is normally distributed and uses the standard errors estimated with Blanche et al. [BDJacqminGadda13] method.

Parameters:

other (Auc) – Another instance of the Auc class representing auc2.
method (str) – Statistical test used to perform the hypothesis test. Defaults to “blanche”. Must be one of the following: “blanche”, “bootstrap”.
n_bootstraps (int) – Number of bootstrap samples. Defaults to 999. Ignored if method is not “bootstrap”.

Returns:

p-value of the statistical test.

Return type:

torch.Tensor

Examples

>>> _ = torch.manual_seed(42)
>>> n = 10
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> auc1 = Auc()
>>> auc1(torch.randn((n,)), event, time)
tensor([0.7500, 0.4286, 0.3333])
>>> auc2 = Auc()
>>> auc2(torch.randn((n,)), event, time)
tensor([0.0000, 0.1429, 0.0556])
>>> auc1.compare(auc2) # default: Blanche
tensor([0.0008, 0.2007, 0.1358])
>>> auc1.compare(auc2, method = "bootstrap")
tensor([0.0220, 0.1970, 0.1650])

torchsurv.metrics.auc

Contents

torchsurv.metrics.auc#