بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Derandomizing Knockoffs

170 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhimei Ren

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Zhimei Ren - Yuting Wei - Emmanuel Cand`es

المنهجية تطبيقات الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Model-X knockoffs is a general procedure that can leverage any feature importance measure to produce a variable selection algorithm, which discovers true effects while rigorously controlling the number or fraction of false positives. Model-X knockoffs is a randomized procedure which relies on the one-time construction of synthetic (random) variables. This paper introduces a derandomization method by aggregating the selection results across multiple runs of the knockoffs algorithm. The derandomization step is designed to be flexible and can be adapted to any variable selection base procedure to yield stable decisions without compromising statistical power. When applied to the base procedure of Janson et al. (2016), we prove that derandomized knockoffs controls both the per family error rate (PFER) and the k family-wise error rate (k-FWER). Further, we carry out extensive numerical studies demonstrating tight type-I error control and markedly enhanced power when compared with alternative variable selection algorithms. Finally, we apply our approach to multi-stage genome-wide association studies of prostate cancer and report locations on the genome that are significantly associated with the disease. When cross-referenced with other studies, we find that the reported associations have been replicated.

قيم البحث

77 - Matthias Kormaksson 2020

Knockoffs provide a general framework for controlling the false discovery rate when performing variable selection. Much of the Knockoffs literature focuses on theoretical challenges and we recognize a need for bringing some of the current ideas into practice. In this paper we propose a sequential algorithm for generating knockoffs when underlying data consists of both continuous and categorical (factor) variables. Further, we present a heuristic multiple knockoffs approach that offers a practical assessment of how robust the knockoff selection process is for a given data set. We conduct extensive simulations to validate performance of the proposed methodology. Finally, we demonstrate the utility of the methods on a large clinical data pool of more than $2,000$ patients with psoriatic arthritis evaluated in 4 clinical trials with an IL-17A inhibitor, secukinumab (Cosentyx), where we determine prognostic factors of a well established clinical outcome. The analyses presented in this paper could provide a wide range of applications to commonly encountered data sets in medical practice and other fields where variable selection is of particular interest.

المنهجية تطبيقات الإحصاء

Financial factors selection with knockoffs: fund replication, explanatory and prediction networks

70 - Damien Challet , Christian Bongiorno , Guillaume Pelletier 2021

We apply the knockoff procedure to factor selection in finance. By building fake but realistic factors, this procedure makes it possible to control the fraction of false discovery in a given set of factors. To show its versatility, we apply it to fun d replication and to the inference of explanatory and prediction networks.

التمويل الإحصائي تطبيقات الإحصاء

FANOK: Knockoffs in Linear Time

59 - Armin Askari , Quentin Rebjock , Alexandre dAspremont 2020

We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. Identifying the knockoff distribution requires solving a large scale semidefinite program for which we derive several efficient methods. One handles generic covariance matrices, has a complexity scaling as $O(p^3)$ where $p$ is the ambient dimension, while another assumes a rank $k$ factor model on the covariance matrix to reduce this complexity bound to $O(pk^2)$. We also derive efficient procedures to both estimate factor models and sample knockoff covariates with complexity linear in the dimension. We test our methods on problems with $p$ as large as $500,000$.

التعلم الآلي المنهجية التعلم الالي

Derandomizing HSSW Algorithm for 3-SAT

863 - Kazuhisa Makino , Suguru Tamaki , Masaki Yamamoto 2011

We present a (full) derandomization of HSSW algorithm for 3-SAT, proposed by Hofmeister, Schoning, Schuler, and Watanabe in [STACS02]. Thereby, we obtain an O(1.3303^n)-time deterministic algorithm for 3-SAT, which is currently fastest.

التعقيد الحسابي بنى وهياكل البيانات والخوارزميات

Semiparametric regression in testicular germ cell data

356 - Anastasia Voulgaraki , Benjamin Kedem , Barry I. Graubard 2010

It is possible to approach regression analysis with random covariates from a semiparametric perspective where information is combined from multiple multivariate sources. The approach assumes a semiparametric density ratio model where multivariate dis tributions are regressed on a reference distribution. A kernel density estimator can be constructed from many data sources in conjunction with the semiparametric model. The estimator is shown to be more efficient than the traditional single-sample kernel density estimator, and its optimal bandwidth is discussed in some detail. Each multivariate distribution and the corresponding conditional expectation (regression) of interest are estimated from the combined data using all sources. Graphical and quantitative diagnostic tools are suggested to assess model validity. The method is applied in quantifying the effect of height and age on weight of germ cell testicular cancer patients. Comparisons are made with multiple regression, generalized additive models (GAM) and nonparametric kernel regression.

المنهجية تطبيقات الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد العالي لإدارة الأعمال

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Derandomizing Knockoffs

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً