Knockoff variable selection: Select the variables by controlling a user-specified error rate

This is the main function that performs the knockoff based variable selection using as input the knockoff statistics W. In case of multiple knockoffs, ncol(W) > 1, the function performs variable selection for each knockoff and additionally stabilizes the selections by combining their outcomes.

Usage

variable.selections(W, level = 0.2, error.type = "fdr", k = NULL, thres = 0.5)

Arguments

W: a data.frame of knockoff W-statistics (feature statistics); columns correspond to different knockoffs and rows correspond to the underlying variables. row.names(W) records the variable names.
level: the nominal level that the user wants to control
error.type: the error rate to control, at the moment "fdr", "pfer" and "kfwer"
k: a positive integer corresponding to k-FWER (multiple testing when one seeks to control at least k false discoveries), to be used only with error.type = 'kfwer'
thres: threshold parameter for stabilizing the selections (eta parameter for derandomized knockoffs, trims parameter for multi_select). A natural choice is thres = 0.5.

Value

an object of class "variable.selections" that is essentially a list with two elements: 1) $selections = (p x M) binary data.frame where rows correspond to variables, and cols correspond to different knockoffs; a value of 1 means the given variable was selected for that particular knockoff simulation, 0 otherwise; 2) $stable.selection = a character vector with the selected variables from stability selection (as described in Details). The second field is only meaningful if user specifies multiple knockoffs (say M > 5). If M = 1 then the stable.selection simply returns the indicies of $selections that are equal to 1.

Details

Knockoffs is a randomized procedure which relies on the construction of synthetic (knockoff) variables. This function performs variable selection for multiple knockoffs and then stabilizes the selections by combining their outcomes. When the pfer or kfwer error is controlled the derandomizing knockoffs is used, which was introduced by Ret et al. (2021) and provably controls this errors. When the fdr is controlled the heuristic multiple selection algorithm is used, which was introduced by Kormaksson et al. (2021).

Z. Ren, Y. Wei, & E. Candès, (2021). Derandomizing knockoffs. Journal of the American Statistical Association, 1-11.

M. Kormaksson, L. J. Kelly, X. Zhu, S. Haemmerle, L. Pricop, & D. Ohlssen (2021). Sequential knockoffs for continuous and categorical predictors: With application to a large psoriatic arthritis clinical trial pool. Statistics in Medicine, 40(14), 3313-3328.

Examples