Skip to contents

This is the main function that performs the knockoff based variable selection using as input the knockoff statistics W. In case of multiple knockoffs, ncol(W) > 1, the function performs variable selection for each knockoff and additionally stabilizes the selections by combining their outcomes.

Usage

variable.selections(W, level = 0.2, error.type = "fdr", k = NULL, thres = 0.5)

Arguments

W

a data.frame of knockoff W-statistics (feature statistics); columns correspond to different knockoffs and rows correspond to the underlying variables. row.names(W) records the variable names.

level

the nominal level that the user wants to control

error.type

the error rate to control, at the moment "fdr", "pfer" and "kfwer"

k

a positive integer corresponding to k-FWER (multiple testing when one seeks to control at least k false discoveries), to be used only with error.type = 'kfwer'

thres

threshold parameter for stabilizing the selections (eta parameter for derandomized knockoffs, trims parameter for multi_select). A natural choice is thres = 0.5.

Value

an object of class "variable.selections" that is essentially a list with two elements: 1) $selections = (p x M) binary data.frame where rows correspond to variables, and cols correspond to different knockoffs; a value of 1 means the given variable was selected for that particular knockoff simulation, 0 otherwise; 2) $stable.selection = a character vector with the selected variables from stability selection (as described in Details). The second field is only meaningful if user specifies multiple knockoffs (say M > 5). If M = 1 then the stable.selection simply returns the indicies of $selections that are equal to 1.

Details

Knockoffs is a randomized procedure which relies on the construction of synthetic (knockoff) variables. This function performs variable selection for multiple knockoffs and then stabilizes the selections by combining their outcomes. When the pfer or kfwer error is controlled the derandomizing knockoffs is used, which was introduced by Ret et al. (2021) and provably controls this errors. When the fdr is controlled the heuristic multiple selection algorithm is used, which was introduced by Kormaksson et al. (2021).

Z. Ren, Y. Wei, & E. Candès, (2021). Derandomizing knockoffs. Journal of the American Statistical Association, 1-11.

M. Kormaksson, L. J. Kelly, X. Zhu, S. Haemmerle, L. Pricop, & D. Ohlssen (2021). Sequential knockoffs for continuous and categorical predictors: With application to a large psoriatic arthritis clinical trial pool. Statistics in Medicine, 40(14), 3313-3328.

See also

plot.variable.selections for plotting an organized heatmap of the selections.

Examples

library(knockofftools)

set.seed(1)

# Simulate 10 Gaussian covariate predictors:
X <- generate_X(n=100, p=10, p_b=0, cov_type="cov_equi", rho=0.2)

# create linear predictor with first 5 beta-coefficients = 1 (all other zero)
lp <- generate_lp(X, p_nn = 5, a=1)

# Gaussian

# Simulate response from a linear model y = lp + epsilon, where epsilon ~ N(0,1):
y <- lp + rnorm(100)

# Calculate M independent knockoff feature statistics:
W <- knockoff.statistics(y=y, X=X, type="regression", M=5)
#> Running sequentially ('LOCAL') ...

S = variable.selections(W, error.type = "pfer", level = 1)

# selections under alternative error control:
S = variable.selections(W, error.type = "kfwer", k=1, level = 0.50)
S = variable.selections(W, error.type = "fdr", level = 0.5)