Package 'SurvMetrics'

Title: Predictive Evaluation Metrics in Survival Analysis
Description: An implementation of popular evaluation metrics that are commonly used in survival prediction including Concordance Index, Brier Score, Integrated Brier Score, Integrated Square Error, Integrated Absolute Error and Mean Absolute Error. For a detailed information, see (Ishwaran H, Kogalur UB, Blackstone EH and Lauer MS (2008) <doi:10.1214/08-AOAS169>) and (Moradian H, Larocque D and Bellavance F (2017) <doi:10.1007/s10985-016-9372-1>) for different evaluation metrics.
Authors: Hanpu Zhou [aut, cre], Xuewei Cheng [aut], Sizheng Wang [aut], Yi Zou [aut], Hong Wang [aut]
Maintainer: Hanpu Zhou <[email protected]>
License: GPL (>= 2)
Version: 0.5.0
Built: 2025-01-09 23:25:38 UTC
Source: https://github.com/cranhaven/cranhaven.r-universe.dev

Help Index


The Brier Score

Description

The Brier Score was proposed by Glenn W. Brier in 1950 which is a proper score function that measures the accuracy of probabilistic predictions, usually used to measure the accuracy of a model fit for survival data. Brier can calculate the value of Brier Score at any timepoint, regardless of whether it is the event time.

Usage

Brier(object, pre_sp, t_star = -1)

Arguments

object

object of class Surv created by Surv function or a fitted survival model, including the survival model fitted by coxph, survreg, and rfsrc.

pre_sp

If the input of the parameter object is a fitted survival model, this parameter should be a survival dataset on which you want to caculate the Brier Score of the fitted model. Or with an object of class Surv as the parameter object, it should be a vector of predicted values of survival probabilities of each observation in testing set at time t_star.

t_star

the timepoint at which the Brier Score you want to calculate. If the input of the parameter object is a fitted survival model, the timepoint is necessary to be specified at which the survival probability is predicted, and this function will calculate the Brier Score at that moment. If the input object is a survival object, this parameter can be ignored and the value of this parameter will not have any effect on the result of this fuction.

Details

The Brier Score is the mean square difference between the true classes and the predicted probabilities. So the Brier Score can be thought of as a cost function. Therefore, the lower the Brier Score is for a set of predictions, the better the predictions are calibrated. The Brier Score takes on a value between zero and one, since this is the square of the largest possible difference between a predicted probability and the actual outcome. As we all know, for the cencoring samples, we do not know the real time of death, so the residual cannot be directly calculated when making the prediction. So the Brier Score is widely used in survival analysis.

The Brier Score is a strictly proper score (Gneiting and Raftery, 2007), which means that it takes its minimal value only when the predicted probabilities match the empirical probabilities.

Judging from the sparse empirical evidence, predictions of duration of survival tend to be rather inaccurate. More precision is achieved by using patient-specific survival probabilities and the Brier score as predictions to discriminate future survivors from failures.

Value

the Brier Score at time t_star between the true classes and the predicted probabilities.

Author(s)

Hanpu Zhou [email protected]

References

Graf, Erika, Schmoor, Claudia, Sauerbrei, & Willi, et al. (1999). Assessment and comparison of prognostic classification schemes for survival data. Statist. Med., 18(1718), 2529-2545.

Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78.

Gneiting, T. , & Raftery, A. E. . (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.

Examples

library(survival)
time <- rexp(50)
status <- sample(c(0, 1), 50, replace = TRUE)
pre_sp <- runif(50)
t_star <- runif(1)
Brier(Surv(time, status), pre_sp, t_star)

The Concordance Index for Right-Censored Survival Time Data

Description

Concordance index is a rank correlation measures between a variable X and a possibly censored variable Y, with event/censoring indicator. In survival analysis, a pair of patients is called concordant if the risk of the event predicted by a model is lower for the patient who experiences the event at a later timepoint. The concordance probability (C-index) is the frequency of concordant pairs among all pairs of subjects. It can be used to measure and compare the discriminative power of a risk prediction models.

Usage

Cindex(object, predicted, t_star = -1)

Arguments

object

object of class Surv created by Surv function or a fitted survival model, including the survival model fitted by coxph, survreg, and rfsrc.

predicted

If the input of the parameter object is a fitted survival model, this parameter should be a survival dataset on which you want to caculate the C-index of the fitted model. Or with an object of class Surv as the parameter object, it should be a vector containing the predicted survival time or probability of each observation.

t_star

the timepoint at which the C-index you want to calculate. If the input of the parameter object is a fitted survival model, the timepoint is necessary to be specified at which the survival probability is predicted, and this function will calculate the C-index between the survival time and the predicted result at that moment. If the input object is a survival object, this parameter can be ignored and the value of this parameter will not have any effect on the result of this fuction.

Details

Pairs with identical observed times, where one is uncensored and one is censored, are always considered usuable (independent of the value of tiedOutcomeIn), as it can be assumed that the event occurs at a later timepoint for the censored observation.

For uncensored response the result equals the one obtained with the functions rcorr.cens and rcorrcens from the Hmisc package (see examples).

Value

Estimates of the C-index between the survival time and the predicted result.

Author(s)

Hanpu Zhou [email protected]

References

Ishwaran, H. , Kogalur, U. B. , Blackstone, E. H. , & Lauer, M. S. . (2008). Random survival forests. Journal of Thoracic Oncology Official Publication of the International Association for the Study of Lung Cancer, 2(12), 841-860.

Kang, L. , Chen, W. , Petrick, N. A. , & Gallas, B. D. . (2015). Comparing two correlated c indices with right-censored survival outcome: a one-shot nonparametric approach. Statistics in Medicine, 34(4).

TA Gerds, MW Kattan, M Schumacher, and C Yu. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring. Statistics in Medicine, Ahead of print:to appear, 2013. DOI = 10.1002/sim.5681

Wolbers, M and Koller, MT and Witteman, JCM and Gerds, TA (2013) Concordance for prognostic models with competing risks Research report 13/3. Department of Biostatistics, University of Copenhagen

Andersen, PK (2012) A note on the decomposition of number of life years lost according to causes of death Research report 12/2. Department of Biostatistics, University of Copenhagen

Paul Blanche, Michael W Kattan, and Thomas A Gerds. The c-index is not proper for the evaluation of-year predicted risks. Biostatistics, 20(2): 347–357, 2018.

Examples

library(survival)
time <- c(1, 1, 2, 2, 2, 2, 2, 2)
status <- c(0, 1, 1, 0, 1, 1, 0, 1)
predicted <- c(2, 3, 3, 3, 4, 2, 4, 3)
Cindex(Surv(time, status), predicted)

Concordance index in the Presence of Competing Risks

Description

The C-index (Concordance index) of the prognostic model in the presence of competing risks according to Marcel, W et al.(2014).

Usage

CindexCR(time, status, predicted, Cause_int = 1)

Arguments

time

minimum value of deletion time and survival time.

status

the status indicator, for models with competing risks, the status indicator is 0=censored, 1=event at time, 2= competing risks at time.

predicted

a vector of predicted values or the survival time of survival probabilities of each observation.

Cause_int

event type of interest, the default value is 1.

Value

Estimates of the C-index in the presence of competing risks.

Author(s)

HanPu Zhou [email protected]

References

Marcel, W. , Paul, B. , Koller, M. T. , Witteman, J. , & Gerds, T. A. . (2014).Concordance for prognostic models with competing risks. Biostatistics(3), 526.

Ishwaran, H. , Kogalur, U. B. , Blackstone, E. H. , & Lauer, M. S. . (2008). Random survival forests. Journal of Thoracic Oncology Official Publication of the International Association for the Study of Lung Cancer, 2(12), 841-860.

Examples

time <- c(4, 7, 5, 8)
status <- rep(1, 4)
predicted <- c(3, 5, 7, 10)
Cause_int <- 1
CindexCR(time, status, predicted, Cause_int)

The Kaplan-Meier Estimate of the Censoring Distribution

Description

G(t)=P(C>t) denote the Kaplan-Meier estimate of the censoring distribution which is used to adjust for censoring. Gt is used to calculate G(t) at any timepoint you want.

Usage

Gt(object, timepoint)

Arguments

object

object of class Surv created by Surv function.

timepoint

any point in time you want to get the Kaplan–Meier estimate of the censoring.

Value

The Kaplan–Meier estimate of the censoring distribution and the value of G(t) is between 0 and 1.

Author(s)

Hanpu Zhou [email protected]

References

Graf, Erika, Schmoor, Claudia, Sauerbrei, & Willi, et al. (1999). Assessment and comparison of prognostic classification schemes for survival data. Statist. Med., 18(1718), 2529-2545.

Kaplan, E. L. , & Meier, P. . (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53, 457-481.

Examples

library(survival)
time <- rexp(50)
status <- sample(c(0, 1), 50, replace = TRUE)
pre_sp <- runif(50)
timepoint <- runif(1)
Gt(Surv(time, status), timepoint)

The Integrate Absolute Error and The Integrate Square Error

Description

Two ways of the continuous-time approach to continuous-time identification based on least-squares and least-absolute errors are proposed. Integrate Absolute Error and Integrate Square Error.To evaluate the performance of survival models methods Lower values of IAE or ISE indicate better performances.

Usage

IAEISE(object, sp_matrix, IRange = c(-2, -1))

Arguments

object

object of class Surv created by Surv function or a fitted survival model, including the survival model fitted by coxph, survreg, and rfsrc.

sp_matrix

a matrix or data.frame of predicted values of survival probabilities for the testing set. rows denote different samples, columns denote different time points, and the values in row i and column j of the matrix denote the predicted survival probability of the ith sample at the time point corresponding to the jth column.

IRange

a vector contains all discrete time points corresponding to the predicted probability in sp_matrix. Or the scale you want to get the IAE and ISE; .

Value

Estimates of the Integrate Absolute Error and the Integrate Square Error of the predicted values of survival probabilities.

Author(s)

Hanpu Zhou [email protected]

References

Marron, J. S. , & Wand, M. P. . (1992). Exact mean integrated squared error. Annals of Statistics, 20(2), 712-736.

HooraMoradian, DenisLarocque, & FranoisBellavance. (2017). L1 splitting rules in survival forests. Lifetime Data Analysis, 23(4), 671–691.

Kowalczuk, & Z. (1998). Integrated squared error and integrated absolute error in recursive identification of continuous-time plants. Control 98 Ukacc International Conference on (Vol.1998, pp.693-698). IET.

Examples

library(survival)
library(SurvMetrics)
set.seed(123)
N <- 100
mydata <- SDGM4(N, p = 20, c_step = -0.5)
index.train <- sample(1:N, 2 / 3 * N)
data.train <- mydata[index.train, ]
data.test <- mydata[-index.train, ]

time_interest <- sort(data.train$time[data.train$status == 1])
sp_matrix <- matrix(sort(runif(nrow(data.test) * length(time_interest)),
  decreasing = TRUE
), nrow = nrow(data.test))
object <- Surv(data.test$time, data.test$status)

# a vector for all the distinct time
IAEISE(object, sp_matrix, time_interest)
# a range
IAEISE(object, sp_matrix, c(12, 350))

The Integration of the Brier Score

Description

IBS is an integrated version of the Brier which is used to calculate the integration of the Brier Score. The Brier Score is the mean square difference between the true classes and the predicted probabilities. Basically, the IBS is an integrated weighted squared distance between the estimated survival function and the empirical survival function. The inverse probability censoring weighting(IPCW) is used to adjust for censoring.

Usage

IBS(object, sp_matrix, IBSrange = c(-2, -1))

Arguments

object

object of class Surv created by Surv function or a fitted survival model, including the survival model fitted by coxph, survreg, and rfsrc.

sp_matrix

a matrix or data.frame of predicted values of survival probabilities for the testing set. Rows denote different samples, columns denote different time points, and the values in entry (i,j) of the matrix denote the predicted survival probability of the ith sample at the time point corresponding to the jth column.

IBSrange

a vector contains all discrete time points corresponding to the predicted probability in sp_matrix. Or the scale you want to get the IBS; and if it is a single point the return value will be the Brier Score at the timepoint.

Details

The percentage of censored observations increases in time, and this will surely affect the dispersion of the empirical Brier Score. The question of how censoring in finite samples acts on the distribution of our measures of inaccuracy is an interesting subject. Our recommendation is to choose t* in a way that censoring is not too heavy (for example, the median follow-up time). We also prefer measures with integrated loss functions since they will reflect inaccuracy over an interval rather than just at one point in time. In addition, the corresponding empirical measures are likely to have lower dispersion, because censored observations contribute their estimated event-free probabilities to the integrand until the censoring occurs.

Value

The integration of the Brier score of the predicted values of survival probabilities on the discrete time points or the time scale of interest to users.

Author(s)

Hanpu Zhou [email protected]

References

HooraMoradian, DenisLarocque, & FranoisBellavance. (2017). \(l_1\) splitting rules in survival forests. Lifetime Data Analysis, 23(4), 671–691.

Graf, Erika, Schmoor, Claudia, Sauerbrei, & Willi, et al. (1999). Assessment and comparison of prognostic classification schemes for survival data. Statist. Med., 18(1718), 2529-2545.

Brier, G. W. . (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78.

Gneiting, T. , & Raftery, A. E. . (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.

Examples

library(survival)
library(SurvMetrics)
set.seed(123)
N <- 100
mydata <- SDGM4(N, p = 20, c_step = -0.5)
index.train <- sample(1:N, 2 / 3 * N)
data.train <- mydata[index.train, ]
data.test <- mydata[-index.train, ]

time_interest <- sort(data.train$time[data.train$status == 1])
sp_matrix <- matrix(sort(runif(nrow(data.test) * length(time_interest)),
  decreasing = TRUE
), nrow = nrow(data.test))
object <- Surv(data.test$time, data.test$status)

# the default time points
IBS(object, sp_matrix, time_interest)
# a time range
IBS(object, sp_matrix, c(18:100))

The Mean Absolute Error

Description

A somewhat naive criterion that is sometimes used consists of simply omitting all censored cases from the data set. For survival analysis problems, the mean absolute error (MAE) can be defined as an average of the differences between the predicted time values and the actual observation time values. Only the samples for which the event occurs are being considered in this metric.

Usage

MAE(object, pre_time)

Arguments

object

object of class Surv created by Surv function.

pre_time

a vector of predicted values of survival time of each observation.

Details

Condition: MAE can only be used for the evaluation of survival models which can provide the event time as the predicted target value.

Value

the value of the Mean Absolute Error between the survival time and the predicted result.

Author(s)

Hanpu Zhou [email protected]

References

Matsuo, K. , Purushotham, S. , Jiang, B. , Mandelbaum, R. S. , Takiuchi, T. , & Liu, Y. , et al. (2018). Survival outcome prediction in cervical cancer: cox models vs deep-learning model. American Journal of Obstetrics & Gynecology. Coyle, E. J. , & Lin, J. H. . (1988). Stack filters and the mean absolute error criterion. IEEE Trans Acoustics Speech Signal Processing, 36(8), 1244-1254.

Examples

library(survival)
time <- rexp(50)
status <- sample(c(0, 1), 50, replace = TRUE)
pre_time <- rexp(50)
MAE(Surv(time, status), pre_time)

Predicting Survival Probabilities for a 'survreg' Object

Description

Function to extract survival probability predictions from survreg modeling approach.

Usage

predictSurvProb2survreg(object, newdata, time_days)

Arguments

object

A model fitted by survreg from which to extract predicted survival probabilities

newdata

A data frame containing predictor variable combinations for which to compute predicted survival probabilities.

time_days

A vector of times in the range of the response variable, We.g. times when the response is a survival object, at which to return the survival probabilities.

Value

A matrix with as many rows as NROW(newdata) and as many columns as length(time_days). Each entry should be a probability and in rows the values should be decreasing.

Author(s)

Hanpu Zhou [email protected]

Examples

library(survival)
set.seed(1234)
mydata <- kidney[, -1]
train_index <- sample(1:nrow(mydata), 0.7 * nrow(mydata))
train_data <- mydata[train_index, ]
test_data <- mydata[-train_index, ]
survregfit <- survreg(Surv(time, status) ~ ., dist = 'weibull', data = train_data)
pre_sb <- predictSurvProb2survreg(survregfit, test_data, c(10, 20))

Survival Data Generation Method 1

Description

Survival data generation method. An example of the proportional hazards model where in the Cox model is expected to perform best.

Usage

SDGM1(N = 200, p = 15, c_mean = 0.4)

Arguments

N

The sample size of the simulated dataset.

p

The covariate dimension of the simulated dataset.

c_mean

The parameter which is used to control the censoring rate.

Value

the simulated dataset

Author(s)

Hanpu Zhou [email protected]

References

Steingrimsson, J. A. , Diao, L. , & Strawderman, R. L. . (2019). Censoring unbiased regression trees and ensembles. Journal of the American Statistical Association, 114.

Zhu, R. , & Kosorok, M. R. . (2012). Recursively imputed survival trees. Journal of the American Statistical Association, 107(497), 331-340.

Ishwaran, H. , Kogalur, U. B. , Gorodeski, E. Z. , Minn, A. J. , & Lauer, M. S. . (2010). High-dimensional variable selection for survival data. Journal of the American Statistical Association, 105(489), 205-217.

Examples

SDGM1(N = 200, p = 15, c_mean = 0.4)

Survival Data Generation Method 2

Description

Survival data generation method. The dataset represents mild violations of the proportional hazards assumption.

Usage

SDGM2(N = 200, p = 15, u_max = 4)

Arguments

N

The sample size of the simulated dataset.

p

The covariate dimension of the simulated dataset.

u_max

The parameter which is used to control the censoring rate.

Value

the simulated dataset

Author(s)

Hanpu Zhou [email protected]

References

Steingrimsson, J. A. , Diao, L. , & Strawderman, R. L. . (2019). Censoring unbiased regression trees and ensembles. Journal of the American Statistical Association, 114.

Zhu, R. , & Kosorok, M. R. . (2012). Recursively imputed survival trees. Journal of the American Statistical Association, 107(497), 331-340.

Ishwaran, H. , Kogalur, U. B. , Gorodeski, E. Z. , Minn, A. J. , & Lauer, M. S. . (2010). High-dimensional variable selection for survival data. Journal of the American Statistical Association, 105(489), 205-217.

Examples

SDGM2(N = 200, p = 15, u_max = 4)

Survival Data Generation Method 3

Description

Survival data generation method. The proportional hazards assumption is strongly violated in this dataset.

Usage

SDGM3(N = 200, p = 15, u_max = 7)

Arguments

N

The sample size of the simulated dataset.

p

The covariate dimension of the simulated dataset.

u_max

The parameter which is used to control the censoring rate.

Value

the simulated dataset

Author(s)

Hanpu Zhou [email protected]

References

Steingrimsson, J. A. , Diao, L. , & Strawderman, R. L. . (2019). Censoring unbiased regression trees and ensembles. Journal of the American Statistical Association, 114.

Zhu, R. , & Kosorok, M. R. . (2012). Recursively imputed survival trees. Journal of the American Statistical Association, 107(497), 331-340.

Ishwaran, H. , Kogalur, U. B. , Gorodeski, E. Z. , Minn, A. J. , & Lauer, M. S. . (2010). High-dimensional variable selection for survival data. Journal of the American Statistical Association, 105(489), 205-217.

Examples

SDGM3(N = 200, p = 15, u_max = 7)

Survival Data Generation Method 4

Description

Survival data generation method. An example of the proportional hazards model where in the Cox model is expected to perform best.

Usage

SDGM4(N = 200, p = 15, c_step = 0.4)

Arguments

N

The sample size of the simulated dataset.

p

The covariate dimension of the simulated dataset.

c_step

The parameter which is used to control the censoring rate.

Value

the simulated dataset

Author(s)

Hanpu Zhou [email protected]

References

Steingrimsson, J. A. , Diao, L. , & Strawderman, R. L. . (2019). Censoring unbiased regression trees and ensembles. Journal of the American Statistical Association, 114.

Zhu, R. , & Kosorok, M. R. . (2012). Recursively imputed survival trees. Journal of the American Statistical Association, 107(497), 331-340.

Ishwaran, H., Kogalur, U. B., Gorodeski, E.Z., Minn, A.J., & Lauer, M. S. . (2010). High-dimensional variable selection for survival data. Journal of the American Statistical Association, 105(489), 205-217.

Examples

SDGM4(N = 200, p = 15, c_step = 0.4)