بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Dimension-agnostic inference

100 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ilmun Kim

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Ilmun Kim - Aaditya Ramdas

نظرية الإحصاء المنهجية التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d_n$ and $n$ both increase to infinity together at some prescribed relative rate. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n gg d$, or $d_n/n approx 0.2$? This paper considers the goal of dimension-agnostic inference -- developing methods whose validity does not depend on any assumption on $d_n$. We introduce a new, generic approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a new test statistic with a Gaussian limiting distribution. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonals. We exemplify our technique for a handful of classical problems including one-sample mean and covariance testing. Our tests are shown to have minimax rate-optimal power against appropriate local alternatives, and without explicitly targeting the high-dimensional setting their power is optimal up to a $sqrt 2$ factor. A hidden advantage is that our proofs are simple and transparent. We end by describing several fruitful open directions.

قيم البحث

59 - Ian Waudby-Smith , David Arbour , Ritwik Sinha 2021

This paper derives time-uniform confidence sequences (CS) for causal effects in experimental and observational settings. A confidence sequence for a target parameter $psi$ is a sequence of confidence intervals $(C_t)_{t=1}^infty$ such that every one of these intervals simultaneously captures $psi$ with high probability. Such CSs provide valid statistical inference for $psi$ at arbitrary stopping times, unlike classical fixed-time confidence intervals which require the sample size to be fixed in advance. Existing methods for constructing CSs focus on the nonasymptotic regime where certain assumptions (such as known bounds on the random variables) are imposed, while doubly robust estimators of causal effects rely on (asymptotic) semiparametric theory. We use sequenti

نظرية الإحصاء المنهجية التعلم الالي

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction

212 - Jue Hou , Zijian Guo , Tianxi Cai 2021

Risk modeling with EHR data is challenging due to a lack of direct observations on the disease outcome, and the high dimensionality of the candidate predictors. In this paper, we develop a surrogate assisted semi-supervised-learning (SAS) approach to risk modeling with high dimensional predictors, leveraging a large unlabeled data on candidate predictors and surrogates of outcome, as well as a small labeled data with annotated outcomes. The SAS procedure borrows information from surrogates along with candidate predictors to impute the unobserved outcomes via a sparse working imputation model with moment conditions to achieve robustness against mis-specification in the imputation model and a one-step bias correction to enable interval estimation for the predicted risk. We demonstrate that the SAS procedure provides valid inference for the predicted risk derived from a high dimensional working model, even when the underlying risk prediction model is dense and the risk model is mis-specified. We present an extensive simulation study to demonstrate the superiority of our SSL approach compared to existing supervised methods. We apply the method to derive genetic risk prediction of type-2 diabetes mellitus using a EHR biobank cohort.

نظرية الإحصاء المنهجية التعلم الالي

Large-scale simultaneous inference under dependence

110 - Jinjin Tian , Xu Chen , Eugene Katsevich 2021

Simultaneous, post-hoc inference is desirable in large-scale hypotheses testing as it allows for exploration of data while deciding on criteria for proclaiming discoveries. It was recently proved that all admissible post-hoc inference methods for the number of true discoveries must be based on closed testing. In this paper we investigate tractable and efficient closed testing with local tests of different properties, such as monotonicty, symmetry and separability, meaning that the test thresholds a monotonic or symmetric function or a function of sums of test scores for the individual hypotheses. This class includes well-known global null tests by Fisher, Stouffer and Ruschendorf, as well as newly proposed ones based on harmonic means and Cauchy combinations. Under monotonicity, we propose a new linear time statistic (coma) that quantifies the cost of multiplicity adjustments. If the tests are also symmetric and separable, we develop several fast (mostly linear-time) algorithms for post-hoc inference, making closed testing tractable. Paired with recent advances in global null tests based on generalized means, our work immediately instantiates a series of simultaneous inference methods that can handle many complex dependence structures and signal compositions. We provide guidance on choosing from these methods via theoretical investigation of the conservativeness and sensitivity for different local tests, as well as simulations that find analogous behavior for local tests and full closed testing. One result of independent interest is the following: if $P_1,dots,P_d$ are $p$-values from a multivariate Gaussian with arbitrary covariance, then their arithmetic average P satisfies $Pr(P leq t) leq t$ for $t leq frac{1}{2d}$.

نظرية الإحصاء المنهجية نظرية الإحصاء

Bootstrap inference for quantile-based modal regression

215 - Tao Zhang , Kengo Kato , David Ruppert 2020

In this paper, we develop uniform inference methods for the conditional mode based on quantile regression. Specifically, we propose to estimate the conditional mode by minimizing the derivative of the estimated conditional quantile function defined b y smoothing the linear quantile regression estimator, and develop two bootstrap methods, a novel pivotal bootstrap and the nonparametric bootstrap, for our conditional mode estimator. Building on high-dimensional Gaussian approximation techniques, we establish the validity of simultaneous confidence rectangles constructed from the two bootstrap methods for the conditional mode. We also extend the preceding analysis to the case where the dimension of the covariate vector is increasing with the sample size. Finally, we conduct simulation experiments and a real data analysis using U.S. wage data to demonstrate the finite sample performance of our inference method.

نظرية الإحصاء المنهجية نظرية الإحصاء

Asymptotic Distributions of High-Dimensional Distance Correlation Inference

72 - Lan Gao , Yingying Fan , Jinchi Lv 2019

Distance correlation has become an increasingly popular tool for detecting the nonlinear dependence between a pair of potentially high-dimensional random vectors. Most existing works have explored its asymptotic distributions under the null hypothesi s of independence between the two random vectors when only the sample size or the dimensionality diverges. Yet its asymptotic null distribution for the more realistic setting when both sample size and dimensionality diverge in the full range remains largely underdeveloped. In this paper, we fill such a gap and develop central limit theorems and associated rates of convergence for a rescaled test statistic based on the bias-corrected distance correlation in high dimensions under some mild regularity conditions and the null hypothesis. Our new theoretical results reveal an interesting phenomenon of blessing of dimensionality for high-dimensional distance correlation inference in the sense that the accuracy of normal approximation can increase with dimensionality. Moreover, we provide a general theory on the power analysis under the alternative hypothesis of dependence, and further justify the capability of the rescaled distance correlation in capturing the pure nonlinear dependency under moderately high dimensionality for a certain type of alternative hypothesis. The theoretical results and finite-sample performance of the rescaled statistic are illustrated with several simulation examples and a blockchain application.

نظرية الإحصاء المنهجية نظرية الإحصاء

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة إيبلا الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Dimension-agnostic inference

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً