torchsurv.metrics.cindex#

Classes

ConcordanceIndex([tied_tol, checks])

Compute the Concordance Index (C-index) for survival models.

class torchsurv.metrics.cindex.ConcordanceIndex(tied_tol: float = 1e-08, checks: bool = True)[source]#

Compute the Concordance Index (C-index) for survival models.

__init__(tied_tol: float = 1e-08, checks: bool = True) dict[source]#

Initialize a ConcordanceIndex for survival class model evaluation.

Parameters:
  • tied_tol (float) – Tolerance for tied risk scores. Defaults to 1e-8.

  • checks (bool) – Whether to perform input format checks. Enabling checks can help catch potential issues in the input data. Defaults to True.

Examples

>>> _ = torch.manual_seed(42)
>>> n = 64
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> estimate = torch.randn((n,))
>>> cindex = ConcordanceIndex()
>>> cindex(estimate, event, time)
tensor(0.5337)
>>> cindex.confidence_interval() # default: Noether, two_sided
tensor([0.3251, 0.7423])
>>> cindex.p_value(method='bootstrap', alternative='greater')
tensor(0.2620)
__call__(estimate: Tensor, event: Tensor, time: Tensor, weight: Tensor | None = None, tmax: Tensor | None = None, instate: bool = True) Tensor[source]#

Compute the Concordance Index (C-index).

The concordance index is the probability that a model correctly predicts which of two comparable samples will experience an event first based on their estimated risk scores.

Parameters:
  • estimate (torch.Tensor) – Estimated risk of event occurrence (i.e., risk score). Can be of shape = (n_samples,) if subject-specific risk score is time-independent, or of shape = (n_samples, n_samples) if subject-specific risk score is evaluated at time.

  • event (torch.Tensor, boolean) – Event indicator of size n_samples (= True if event occured).

  • time (torch.Tensor, float) – Time-to-event or censoring of size n_samples.

  • weight (torch.Tensor, optional) – Optional sample weight of size n_samples. Defaults to 1.

  • tmax (torch.Tensor, optional) – Truncation time. Defaults to None. tmax should be chosen such that the probability of being censored after time tmax is non-zero. If tmax is None, no truncation is performed.

  • instate (bool) – Whether to create/overwrite internal attributes states. Defaults to True.

Returns:

Concordance-index

Return type:

torch.Tensor

Note

The concordance index provides a global assessment of a fitted survival model over the entire observational period rather than focussing on the prediction for a fixed time (e.g., 10-year mortality). It is recommended to use AUC instead of the concordance index for such time-dependent predictions, as AUC is proper in this context, while the concordance index is not [BKG18].

For each subject \(i \in \{1, \cdots, N\}\), denote \(X_i\) as the survival time and \(D_i\) as the censoring time. Survival data consist of the event indicator, \(\delta_i=(X_i\leq D_i)\) (argument event) and the time-to-event or censoring, \(T_i = \min(\{ X_i,D_i \})\) (argument time).

The risk score measures the risk (or a proxy thereof) that a subject has an event. The function accepts time-dependent risk score and time-independent risk score. The time-dependent risk score of subject \(i\) is specified through a function \(q_i: [0, \infty) \rightarrow \mathbb{R}\). The time-independent risk score of subject \(i\) is specified by a constant \(q_i\). The argument estimate is the estimated risk score. For time-dependent risk score, the argument estimate should be a tensor of shape = (N,N) (\((i,j)\) th element is \(\hat{q}_i(T_j)\)). For time-independent risk score, the argument estimate should be a tensor of size N (\(i\) th element is \(\hat{q}_i\)).

For a pair \((i,j)\), we say that the pair is comparable if the event of \(i\) has occured before the event of \(j\), i.e., \(X_i < X_j\). Given that the pair is comparable, we say that the pair is concordant if \(q_i(X_i) > q_j(X_i)\).

The concordance index measures the probability that, for a pair of randomly selected comparable samples, the one that experiences an event first has a higher risk. The concordance index is defined as

\[C = p(q_i(X_i) > q_j(X_i) \: | \: X_i < X_j)\]

The default concordance index estimate is the popular nonparametric estimation proposed by Harrell et al. [HLM96]

\[\hat{C} = \frac{\sum_i\sum_j \delta_i \: I(T_i < T_j)\left(I \left( \hat{q}_i(T_i) > \hat{q}_j(T_i) \right) + \frac{1}{2} I\left(\hat{q}_i(T_i) = \hat{q}_j(T_i)\right)\right)}{\sum_i\sum_j \delta_i\: I\left(T_i < T_j\right)}\]

When the event time are censored, the Harrell’s concordance index converges to an association measure that involves the censoring distribution. To address this shortcoming, Uno et al. [UCP+11] proposed to employ the inverse probability weighting technique. In this context, each subject with event time at \(t\) is weighted by the inverse probability of censoring \(\omega(t) = 1 / \hat{D}(t)\), where \(\hat{D}(t)\) is the Kaplan-Meier estimate of the censoring distribution, \(P(D>t)\). Let \(\omega(T_i)\) be the weight associated with subject time \(i\) (argument weight). The concordance index estimate with weight is,

\[\hat{C} = \frac{\sum_i\sum_j \delta_i \: \omega(T_i)^2 \: I(T_i < T_j)\left(I \left( \hat{q}_i(T_i) > \hat{q}_j(T_i) \right) + \frac{1}{2} I\left(\hat{q}_i(T_i) = \hat{q}_j(T_i)\right)\right)}{\sum_i\sum_j \delta_i \: \omega(T_i)^2\: I\left(T_i < T_j\right)}\]

In the context of train/test split, the weights should be derived from the censoring distribution estimated in the training data. Specifically, the censoring distribution is estimated using the training set and then evaluated at the subject time within the test set.

Examples

>>> from torchsurv.stats.ipcw import get_ipcw
>>> _ = torch.manual_seed(42)
>>> n = 64
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> estimate = torch.randn((n,))
>>> cindex = ConcordanceIndex()
>>> cindex(estimate, event, time) # Harrell's c-index
tensor(0.5337)
>>> ipcw = get_ipcw(event, time) # ipcw at subject time
>>> cindex(estimate, event, time, weight=ipcw) # Uno's c-index
tensor(0.5453)

References

[BKG18]

Paul Blanche, Michael W Kattan, and Thomas A Gerds. The c-index is not proper for the evaluation of $t$-year predicted risks. Biostatistics, 20(2):347–357, February 2018.

[HLM96]

Frank E. Harrell, Kerry L. Lee, and Daniel B. Mark. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errorss. Statistics in Medicine, 15(4):361–387, February 1996.

[UCP+11]

Hajime Uno, Tianxi Cai, Michael J. Pencina, Ralph B. D’Agostino, and L. J. Wei. On the c‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Statistics in Medicine, 30(10):1105–1117, January 2011.

confidence_interval(method: str = 'noether', alpha: float = 0.05, alternative: str = 'two_sided', n_bootstraps: int = 999) Tensor[source]#

Compute the confidence interval of the Concordance index.

This function calculates either the pointwise confidence interval or the bootstrap confidence interval for the concordance index. The pointwise confidence interval is computed assuming that the concordance index is normally distributed and using the standard error estimated with either the Noether or the conservative method [PDAgostino04]. The bootstrap confidence interval is constructed based on the distribution of bootstrap samples.

Parameters:
  • method (str) – Method for computing confidence interval. Defaults to “noether”. Must be one of the following: “noether”, “conservative”, “bootstrap”.

  • alpha (float) – Significance level. Defaults to 0.05.

  • alternative (str) – Alternative hypothesis. Defaults to “two_sided”. Must be one of the following: “two_sided”, “greater”, “less”.

  • n_bootstraps (int) – Number of bootstrap samples. Defaults to 999. Ignored if method is not “bootstrap”.

Returns:

Lower and upper bounds of the confidence interval.

Return type:

torch.Tensor([lower,upper])

Examples

>>> _ = torch.manual_seed(42)
>>> n = 64
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> estimate = torch.randn((n,))
>>> cindex = ConcordanceIndex()
>>> cindex(estimate, event, time)
tensor(0.5337)
>>> cindex.confidence_interval() # default: Noether, two_sided
tensor([0.3251, 0.7423])
>>> cindex.confidence_interval(method="bootstrap", alternative="greater")
tensor([0.4459, 1.0000])
>>> cindex.confidence_interval(method="conservative", alternative="less")
tensor([0.0000, 0.7558])

References

[PDAgostino04] (1,2,3)

Michael J. Pencina and Ralph B. D’Agostino. Overall c as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Statistics in Medicine, 23(13):2109–2123, June 2004.

p_value(method: str = 'noether', alternative: str = 'two_sided', n_bootstraps: int = 999) Tensor[source]#

Perform one-sample hypothesis test on the Concordance index.

This function calculates either the pointwise p-value or the bootstrap p-value for testing the null hypothesis that the concordance index is equal to 0.5. The pointwise p-value is computed assuming that the concordance index is normally distributed and using the standard error estimated using the Noether’s method [PDAgostino04]. The bootstrap p-value is derived by permuting risk predictions to estimate the sampling distribution under the null hypothesis.

Parameters:
  • method (str) – Method for computing p-value. Defaults to “noether”. Must be one of the following: “noether”, “bootstrap”.

  • alternative (str) – Alternative hypothesis. Defaults to “two_sided”. Must be one of the following: “two_sided” (concordance index is not equal to 0.5), “greater” (concordance index is greater than 0.5), “less” (concordance index is less than 0.5).

  • n_bootstraps (int) – Number of bootstrap samples. Defaults to 999. Ignored if method is not “bootstrap”.

Returns:

p-value of statistical test.

Return type:

torch.Tensor

Examples

>>> _ = torch.manual_seed(42)
>>> n = 64
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> estimate = torch.randn((n,))
>>> cindex = ConcordanceIndex()
>>> cindex(estimate, event, time)
tensor(0.5337)
>>> cindex.p_value() # default: Noether, two_sided
tensor(0.7516)
>>> cindex.p_value(method="bootstrap", alternative="greater")
tensor(0.2620)
compare(other, method: str = 'noether', n_bootstraps: int = 999) tensor[source]#

Compare two Concordance indices.

This function compares two concordance indices computed on the same data with different risk scores. The statistical hypotheses are formulated as follows, null hypothesis: cindex1 = cindex2 and alternative hypothesis: cindex1 > cindex2. The statistical test is either a Student t-test for dependent samples or a two-sample bootstrap test. The Student t-test assumes that the concordance index is normally distributed and uses standard errors estimated with the Noether’s method [PDAgostino04].

Parameters:
  • other (ConcordanceIndex) – Another instance of the ConcordanceIndex class representing cindex2.

  • method (str) – Statistical test used to perform the hypothesis test. Defaults to “noether”. Must be one of the following: “noether”, “bootstrap”.

  • n_bootstraps (int) – Number of bootstrap samples. Defaults to 999. Ignored if method is not “bootstrap”.

Returns:

p-value of the statistical test.

Return type:

torch.tensor

Examples

>>> _ = torch.manual_seed(42)
>>> n = 64
>>> time = torch.randint(low=5, high=250, size=(n,)).float()
>>> event = torch.randint(low=0, high=2, size=(n,)).bool()
>>> cindex1 = ConcordanceIndex()
>>> cindex1(torch.randn((n,)), event, time)
tensor(0.5337)
>>> cindex2 = ConcordanceIndex()
>>> cindex2(torch.randn((n,)), event, time)
tensor(0.5047)
>>> cindex1.compare(cindex2) # default: Noether
tensor(0.4267)
>>> cindex1.compare(cindex2, method = "bootstrap")
tensor(0.3620)