ترغب بنشر مسار تعليمي؟ اضغط هنا

Selective inference with a randomized response

144   0   0.0 ( 0 )
 نشر من قبل Xiaoying Tian Harris
 تاريخ النشر 2015
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Inspired by sample splitting and the reusable holdout introduced in the field of differential privacy, we consider selective inference with a randomized response. We discuss two major advantages of using a randomized response for model selection. First, the selectively valid tests are more powerful after randomized selection. Second, it allows consistent estimation and weak convergence of selective inference procedures. Under independent sampling, we prove a selective (or privatized) central limit theorem that transfers procedures valid under asymptotic normality without selection to their corresponding selective counterparts. This allows selective inference in nonparametric settings. Finally, we propose a framework of inference after combining multiple randomized selection procedures. We focus on the classical asymptotic setting, leaving the interesting high-dimensional asymptotic questions for future work.



قيم البحث

اقرأ أيضاً

Selective inference is a recent research topic that tries to perform valid inference after using the data to select a reasonable statistical model. We propose MAGIC, a new method for selective inference that is general, powerful and tractable. MAGIC is a method for selective inference after solving a convex optimization problem with smooth loss and $ell_1$ penalty. Randomization is incorporated into the optimization problem to boost statistical power. Through reparametrization, MAGIC reduces the problem into a sampling problem with simple constraints. MAGIC applies to many $ell_1$ penalized optimization problem including the Lasso, logistic Lasso and neighborhood selection in graphical models, all of which we consider in this paper.
We consider the problem of selective inference after solving a (randomized) convex statistical learning program in the form of a penalized or constrained loss function. Our first main result is a change-of-measure formula that describes many conditio nal sampling problems of interest in selective inference. Our approach is model-agnostic in the sense that users may provide their own statistical model for inference, we simply provide the modification of each distribution in the model after the selection. Our second main result describes the geometric structure in the Jacobian appearing in the change of measure, drawing connections to curvature measures appearing in Weyl-Steiner volume-of-tubes formulae. This Jacobian is necessary for problems in which the convex penalty is not polyhedral, with the prototypical example being group LASSO or the nuclear norm. We derive explicit formulae for the Jacobian of the group LASSO. To illustrate the generality of our method, we consider many examples throughout, varying both the penalty or constraint in the statistical learning problem as well as the loss function, also considering selective inference after solving multiple statistical learning programs. Penalties considered include LASSO, forward stepwise, stagewise algorithms, marginal screening and generalized LASSO. Loss functions considered include squared-error, logistic, and log-det for covariance matrix estimation. Having described the appropriate distribution we wish to sample from through our first two results, we outline a framework for sampling using a projected Langevin sampler in the (commonly occuring) case that the distribution is log-concave.
Naive Bayes classifiers have proven to be useful in many prediction problems with complete training data. Here we consider the situation where a naive Bayes classifier is trained with data where the response is right censored. Such prediction problem s are for instance encountered in profiling systems used at National Employment Agencies. In this paper we propose the maximum collective conditional likelihood estimator for the prediction and show that it is strongly consistent under the usual identifiability condition.
We introduce a multiscale test statistic based on local order statistics and spacings that provides simultaneous confidence statements for the existence and location of local increases and decreases of a density or a failure rate. The procedure provi des guaranteed finite-sample significance levels, is easy to implement and possesses certain asymptotic optimality and adaptivity properties.
This paper presents and analyzes an approach to cluster-based inference for dependent data. The primary setting considered here is with spatially indexed data in which the dependence structure of observed random variables is characterized by a known, observed dissimilarity measure over spatial indices. Observations are partitioned into clusters with the use of an unsupervised clustering algorithm applied to the dissimilarity measure. Once the partition into clusters is learned, a cluster-based inference procedure is applied to a statistical hypothesis testing procedure. The procedure proposed in the paper allows the number of clusters to depend on the data, which gives researchers a principled method for choosing an appropriate clustering level. The paper gives conditions under which the proposed procedure asymptotically attains correct size. A simulation study shows that the proposed procedure attains near nominal size in finite samples in a variety of statistical testing problems with dependent data.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا