torchsurv.metrics.brier_score#
Classes
|
Compute the Brier Score for survival models. |
- class torchsurv.metrics.brier_score.BrierScore(checks: bool = True)[source]#
Compute the Brier Score for survival models.
- __init__(checks: bool = True)[source]#
Initialize a BrierScore for survival class model evaluation.
- Parameters:
checks (bool) – Whether to perform input format checks. Enabling checks can help catch potential issues in the input data. Defaults to True.
Examples
>>> _ = torch.manual_seed(52) >>> n = 10 >>> time = torch.randint(low=5, high=250, size=(n,)).float() >>> event = torch.randint(low=0, high=2, size=(n,)).bool() >>> estimate = torch.rand((n,len(time))) >>> brier_score = BrierScore() >>> brier_score(estimate, event, time) tensor([0.2463, 0.2740, 0.3899, 0.1964, 0.3608, 0.2821, 0.1932, 0.2978, 0.1950, 0.1668]) >>> brier_score.integral() # integrated brier score tensor(0.2862) >>> brier_score.confidence_interval() # default: parametric, two-sided tensor([[0.1061, 0.0604, 0.2360, 0.0533, 0.1252, 0.0795, 0.0000, 0.1512, 0.0381, 0.0051], [0.3866, 0.4876, 0.5437, 0.3394, 0.5965, 0.4847, 0.4137, 0.4443, 0.3520, 0.3285]]) >>> brier_score.p_value() # default: bootstrap permutation test, two-sided tensor([1.0000, 0.7860, 1.0000, 0.3840, 1.0000, 1.0000, 0.3840, 1.0000, 0.7000, 0.2380])
- __call__(estimate: Tensor, event: Tensor, time: Tensor, new_time: Tensor | None = None, weight: Tensor | None = None, weight_new_time: Tensor | None = None, instate: bool = True) Tensor [source]#
Compute the Brier Score.
- Parameters:
estimate (torch.Tensor) – Estimated probability of remaining event-free (i.e., survival function). Can be of shape = (n_samples, n_samples) if subject-specific survival is evaluated at
time
, or of shape = (n_samples, n_times) if subject-specific survival is evaluated atnew_time
.event (torch.Tensor, boolean) – Event indicator of size n_samples (= True if event occured)
time (torch.Tensor, float) – Time-to-event or censoring of size n_samples.
new_time (torch.Tensor, float, optional) – Time points at which to evaluate the Brier score of size n_times. Defaults to unique
time
.weight (torch.Tensor, optional) – Optional sample weight evaluated at
time
of size n_samples. Defaults to 1.weight_new_time (torch.tensor, optional) – Optional sample weight evaluated at
new_time
of size n_times. Defaults to 1.
- Returns:
Brier score evaluated at
new_time
.- Return type:
torch.Tensor
Note
The function evaluates the time-dependent Brier score at time \(t \in \{t_1, \cdots, t_T\}\) (argument
new_time
).For each subject \(i \in \{1, \cdots, N\}\), denote \(X_i\) as the survival time and \(D_i\) as the censoring time. Survival data consist of the event indicator, \(\delta_i=(X_i\leq D_i)\) (argument
event
) and the time-to-event or censoring, \(T_i = \min(\{ X_i,D_i \})\) (argumenttime
).The survival function, of subject \(i\) is specified through \(S_i: [0, \infty) \rightarrow [0,1]\). The argument
estimate
is the estimated survival function. Ifnew_time
is specified, it should be of shape = (N,T) (\((i,k)\) th element is \(\hat{S}_i(t_k)\)); ifnew_time
is not specified, it should be of shape = (N,N) (\((i,j)\) th element is \(\hat{S}_i(T_j)\)).The time-dependent Brier score [GSSS99] at time \(t\) is the mean squared error of the event status
\[BS(t) = \mathbb{E}\left[\left(1\left(X > t\right) - \hat{S}(t)\right)^2\right]\]The default Brier score estimate is
\[\hat{BS}(t) = \frac{1}{n}\sum_i 1(T_i \leq t, \delta_i = 1) (0 - \hat{S}_i(t))^2 + 1(T_1 > t) (1- \hat{S}_i(t))^2\]To account for the fact that the event time are censored, the inverse probability weighting technique can be used. In this context, each subject associated with time \(t\) is weighted by the inverse probability of censoring \(\omega(t) = 1 / \hat{D}(t)\), where \(\hat{D}(t)\) is the Kaplan-Meier estimate of the censoring distribution, \(P(D>t)\). The censoring-adjusted Brier score is
\[\hat{BS}(t) = \frac{1}{n}\sum_i \omega(T_i) 1(T_i \leq t, \delta_i = 1) (0 - \hat{S}_i(t))^2 + \omega(t) 1(T_1 > t) (1- \hat{S}_i(t))^2\]The censoring-adjusted Brier score can be obtained by specifying the argument
weight
, the weights evaluated at eachtime
(\(\omega(T_1), \cdots, \omega(T_N)\)). Ifnew_time
is specified, the argumentweight_new_time
should also be specified accordingly, the weights evaluated at eachnew_time
(\(\omega(t_1), \cdots, \omega(t_K)\)). In the context of train/test split, the weights should be derived from the censoring distribution estimated in the training data.Examples
>>> from torchsurv.stats.ipcw import get_ipcw >>> _ = torch.manual_seed(52) >>> n = 10 >>> time = torch.randint(low=5, high=250, size=(n,)).float() >>> event = torch.randint(low=0, high=2, size=(n,)).bool() >>> estimate = torch.rand((n,len(time))) >>> brier_score = BrierScore() >>> brier_score(estimate, event, time) tensor([0.2463, 0.2740, 0.3899, 0.1964, 0.3608, 0.2821, 0.1932, 0.2978, 0.1950, 0.1668]) >>> ipcw = get_ipcw(event, time) # ipcw at time >>> brier_score(estimate, event, time, weight=ipcw) # censoring-adjusted brier-score tensor([0.2463, 0.2740, 0.4282, 0.2163, 0.4465, 0.3826, 0.2630, 0.3888, 0.2219, 0.1882]) >>> new_time = torch.unique(torch.randint(low=5, high=time.max().int(), size=(n*2,)).float()) >>> ipcw_new_time = get_ipcw(event, time, new_time) # ipcw at new_time >>> estimate = torch.rand((n,len(new_time))) >>> brier_score(estimate, event, time, new_time, ipcw, ipcw_new_time) # censoring-adjusted brier-score at new time tensor([0.4036, 0.3014, 0.2517, 0.3947, 0.4200, 0.3908, 0.3766, 0.3737, 0.3596, 0.2088, 0.4922, 0.3237, 0.2255, 0.1841, 0.3029, 0.6919, 0.2357, 0.3507, 0.4364, 0.3312])
References
- integral()[source]#
Compute the integrated Brier Score.
- Returns:
Integrated Brier Score.
- Return type:
torch.Tensor
Examples
>>> _ = torch.manual_seed(52) >>> n = 10 >>> time = torch.randint(low=5, high=250, size=(n,)).float() >>> event = torch.randint(low=0, high=2, size=(n,)).bool() >>> estimate = torch.rand((n,len(time))) >>> brier_score = BrierScore() >>> brier_score(estimate, event, time) tensor([0.2463, 0.2740, 0.3899, 0.1964, 0.3608, 0.2821, 0.1932, 0.2978, 0.1950, 0.1668]) >>> brier_score.integral() # integrated brier score tensor(0.2862)
Note
The integrated Brier score is the integral of the time-dependent Brier score over the interval \([t_1, t_2]\), where \(t_1 = \min\left(\{T_i\}_{i = 1, \cdots, N}\right)\) and \(t_2 = \max\left(\{T_i\}_{i = 1, \cdots, N}\right)\). It is defined by [GSSS99]
\[\hat{IBS} = \int_{t_1}^{t_2} \hat{BS}(t) dW(t)\]where \(W(t) = t / t_2\).
The integral is estimated with the trapzoidal rule.
- confidence_interval(method: str = 'parametric', alpha: float = 0.05, alternative: str = 'two_sided', n_bootstraps: int = 999) Tensor [source]#
Compute the confidence interval of the Brier Score.
This function calculates either the pointwise confidence interval or the bootstrap confidence interval for the Brier Score. The pointwise confidence interval is computed assuming that the Brier score is normally distributed and using empirical standard errors. The bootstrap confidence interval is constructed based on the distribution of bootstrap samples.
- Parameters:
method (str) – Method for computing confidence interval. Defaults to “parametric”. Must be one of the following: “parametric”, “bootstrap”.
alpha (float) – Significance level. Defaults to 0.05.
alternative (str) – Alternative hypothesis. Defaults to “two_sided”. Must be one of the following: “two_sided”, “greater”, “less”.
n_bootstraps (int) – Number of bootstrap samples. Defaults to 999. Ignored if
method
is not “bootstrap”.
- Returns:
Lower and upper bounds of the confidence interval.
- Return type:
torch.Tensor([lower,upper])
Examples
>>> _ = torch.manual_seed(52) >>> n = 10 >>> time = torch.randint(low=5, high=250, size=(n,)).float() >>> event = torch.randint(low=0, high=2, size=(n,)).bool() >>> estimate = torch.rand((n,len(time))) >>> brier_score = BrierScore() >>> brier_score(estimate, event, time) tensor([0.2463, 0.2740, 0.3899, 0.1964, 0.3608, 0.2821, 0.1932, 0.2978, 0.1950, 0.1668]) >>> brier_score.confidence_interval() # default: parametric, two-sided tensor([[0.1061, 0.0604, 0.2360, 0.0533, 0.1252, 0.0795, 0.0000, 0.1512, 0.0381, 0.0051], [0.3866, 0.4876, 0.5437, 0.3394, 0.5965, 0.4847, 0.4137, 0.4443, 0.3520, 0.3285]]) >>> brier_score.confidence_interval(method = "bootstrap", alternative = "greater") tensor([[0.1455, 0.1155, 0.2741, 0.0903, 0.1985, 0.1323, 0.0245, 0.1938, 0.0788, 0.0440], [1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000]])
- p_value(method: str = 'bootstrap', alternative: str = 'two_sided', n_bootstraps: int = 999, null_value: float = None) Tensor [source]#
Perform a one-sample hypothesis test on the Brier score.
This function calculates either the pointwise p-value or the bootstrap p-value for testing the null hypothesis that the estimated brier score is equal to bs0, where bs0 is the brier score that would be expected if the survival model was not providing accurate predictions beyond random chance. The pointwise p-value is computed assuming that the Brier score is normally distributed and using the empirical standard errors. To obtain the pointwise p-value, the Brier score under the null, bs0, must be provided. The bootstrap p-value is derived by permuting survival function’s predictions to estimate the the sampling distribution under the null hypothesis.
- Parameters:
method (str) – Method for computing p-value. Defaults to “bootstrap”. Must be one of the following: “parametric”, “bootstrap”.
alternative (str) – Alternative hypothesis. Defaults to “two_sided”. Must be one of the following: “two_sided” (Brier score is not equal to bs0), “greater” (Brier score is greater than bs0), “less” (Brier score is less than bs0).
n_bootstraps (int) – Number of bootstrap samples. Defaults to 999. Ignored if
`method`
is not “bootstrap”.null_value (float) – The Brier score expected if the survival model was not providing accurate predictions beyond what would be beyond by random chance alone, i.e., bs0.
- Returns:
p-value of the statistical test.
- Return type:
torch.Tensor
Examples
>>> _ = torch.manual_seed(52) >>> n = 10 >>> time = torch.randint(low=5, high=250, size=(n,)).float() >>> event = torch.randint(low=0, high=2, size=(n,)).bool() >>> new_time = torch.unique(time) >>> estimate = torch.rand((n,len(new_time))) >>> brier_score = BrierScore() >>> brier_score(estimate, event, time, new_time) tensor([0.3465, 0.5310, 0.4222, 0.4582, 0.3601, 0.3395, 0.2285, 0.1975, 0.3120, 0.3883]) >>> brier_score.p_value() # Default: bootstrap, two_sided tensor([1.0000, 0.0560, 1.0000, 1.0000, 1.0000, 1.0000, 0.8320, 0.8620, 1.0000, 1.0000]) >>> brier_score.p_value(method = "parametric", alternative = "less", null_value = 0.3) # H0: bs = 0.3, Ha: bs < 0.3 tensor([0.7130, 0.9964, 0.8658, 0.8935, 0.6900, 0.6630, 0.1277, 0.1128, 0.5383, 0.8041])
- compare(other, method: str = 'parametric', n_bootstraps: int = 999) tensor [source]#
Compare two Brier scores.
This function compares two Brier scores computed on the same data with different risk scores. The statistical hypotheses are formulated as follows, null hypothesis: brierscore1 = brierscore2 and alternative hypothesis: brierscore1 < brierscore2. The statistical test is either a Student t-test for paired samples or a two-sample bootstrap test. The Student t-test for paired samples assumes that the Brier Scores are normally distributed and uses the Brier scores’ empirical standard errors.
- Parameters:
other (BrierScore) – Another instance of the BrierScore class representing brierscore2.
method (str) – Statistical test used to perform the hypothesis test. Defaults to “parametric”. Must be one of the following: “parametric”, “bootstrap”.
n_bootstraps (int) – Number of bootstrap samples. Defaults to 999. Ignored if
method
is not “bootstrap”.
- Returns:
p-value of the statistical test.
- Return type:
torch.tensor
Examples
>>> _ = torch.manual_seed(52) >>> n = 10 >>> time = torch.randint(low=5, high=250, size=(n,)).float() >>> event = torch.randint(low=0, high=2, size=(n,)).bool() >>> brier_score = BrierScore() >>> brier_score(torch.rand((n,len(time))), event, time) tensor([0.2463, 0.2740, 0.3899, 0.1964, 0.3608, 0.2821, 0.1932, 0.2978, 0.1950, 0.1668]) >>> brier_score2 = BrierScore() >>> brier_score2(torch.rand((n,len(time))), event, time) tensor([0.4136, 0.2750, 0.3002, 0.2826, 0.2030, 0.2643, 0.2525, 0.2964, 0.1804, 0.3109]) >>> brier_score.compare(brier_score2) # default: parametric tensor([0.1793, 0.4972, 0.7105, 0.1985, 0.9254, 0.5591, 0.3455, 0.5060, 0.5437, 0.0674]) >>> brier_score.compare(brier_score2, method = "bootstrap") tensor([0.1360, 0.5030, 0.7310, 0.2090, 0.8630, 0.5490, 0.3120, 0.5110, 0.5460, 0.1030])