ﻻ يوجد ملخص باللغة العربية
Model-X knockoffs is a general procedure that can leverage any feature importance measure to produce a variable selection algorithm, which discovers true effects while rigorously controlling the number or fraction of false positives. Model-X knockoffs is a randomized procedure which relies on the one-time construction of synthetic (random) variables. This paper introduces a derandomization method by aggregating the selection results across multiple runs of the knockoffs algorithm. The derandomization step is designed to be flexible and can be adapted to any variable selection base procedure to yield stable decisions without compromising statistical power. When applied to the base procedure of Janson et al. (2016), we prove that derandomized knockoffs controls both the per family error rate (PFER) and the k family-wise error rate (k-FWER). Further, we carry out extensive numerical studies demonstrating tight type-I error control and markedly enhanced power when compared with alternative variable selection algorithms. Finally, we apply our approach to multi-stage genome-wide association studies of prostate cancer and report locations on the genome that are significantly associated with the disease. When cross-referenced with other studies, we find that the reported associations have been replicated.
Knockoffs provide a general framework for controlling the false discovery rate when performing variable selection. Much of the Knockoffs literature focuses on theoretical challenges and we recognize a need for bringing some of the current ideas into
We apply the knockoff procedure to factor selection in finance. By building fake but realistic factors, this procedure makes it possible to control the fraction of false discovery in a given set of factors. To show its versatility, we apply it to fun
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. Identifying the knockoff distribution requires solving a large scale semidefinite
We present a (full) derandomization of HSSW algorithm for 3-SAT, proposed by Hofmeister, Schoning, Schuler, and Watanabe in [STACS02]. Thereby, we obtain an O(1.3303^n)-time deterministic algorithm for 3-SAT, which is currently fastest.
It is possible to approach regression analysis with random covariates from a semiparametric perspective where information is combined from multiple multivariate sources. The approach assumes a semiparametric density ratio model where multivariate dis