ترغب بنشر مسار تعليمي؟ اضغط هنا

Selective Correlations - the conditional estimators

128   0   0.0 ( 0 )
 نشر من قبل Amit Meir
 تاريخ النشر 2014
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

The problem of Voodoo correlations is recognized in neuroimaging as the problem of estimating quantities of interest from the same data that was used to select them as interesting. In statistical terminology, the problem of inference following selection from the same data is that of selective inference. Motivated by the unwelcome side-effects of the recommended remedy- splitting the data. A method for constructing confidence intervals based on the correct post-selection distribution of the observations has been suggested recently. We utilize a similar approach in order to provide point estimates that account for a large part of the selection bias. We show via extensive simulations that the proposed estimator has favorable properties, namely, that it is likely to reduce estimation bias and the mean squared error compared to the direct estimator without sacrificing power to detect non-zero correlation as in the case of the data splitting approach. We show that both point estimates and confidence intervals are needed in order to get a full assessment of the uncertainty in the point estimates as both are integrated into the Confidence Calibration Plots proposed recently. The computation of the estimators is implemented in an accompanying software package.



قيم البحث

اقرأ أيضاً

Estimation of autocorrelations and spectral densities is of fundamental importance in many fields of science, from identifying pulsar signals in astronomy to measuring heart beats in medicine. In circumstances where one is interested in specific auto correlation functions that do not fit into any simple families of models, such as auto-regressive moving average (ARMA), estimating model parameters is generally approached in one of two ways: by fitting the model autocorrelation function to a non-parameteric autocorrelation estimate via regression analysis or by fitting the model autocorrelation function directly to the data via maximum likelihood. Prior literature suggests that variogram regression yields parameter estimates of comparable quality to maximum likelihood. In this letter we demonstrate that, as sample size is increases, the accuracy of the maximum-likelihood estimates (MLE) ultimately improves by orders of magnitude beyond that of variogram regression. For relatively continuous and Gaussian processes, this improvement can occur for sample sizes of less than 100. Moreover, even where the accuracy of these methods is comparable, the MLE remains almost universally better and, more critically, variogram regression does not provide reliable confidence intervals. Inaccurate regression parameter estimates are typically accompanied by underestimated standard errors, whereas likelihood provides reliable confidence intervals.
Modern statistics provides an ever-expanding toolkit for estimating unknown parameters. Consequently, applied statisticians frequently face a difficult decision: retain a parameter estimate from a familiar method or replace it with an estimate from a newer or complex one. While it is traditional to compare estimators using risk, such comparisons are rarely conclusive in realistic settings. In response, we propose the c-value as a measure of confidence that a new estimate achieves smaller loss than an old estimate on a given dataset. We show that it is unlikely that a computed c-value is large and that the new estimate has larger loss than the old. Therefore, just as a small p-value provides evidence to reject a null hypothesis, a large c-value provides evidence to use a new estimate in place of the old. For a wide class of problems and estimators, we show how to compute a c-value by first constructing a data-dependent high-probability lower bound on the difference in loss. The c-value is frequentist in nature, but we show that it can provide a validation of Bayesian estimates in real data applications involving hierarchical models and Gaussian processes.
Regression trees and their ensemble methods are popular methods for nonparametric regression: they combine strong predictive performance with interpretable estimators. To improve their utility for locally smooth response surfaces, we study regression trees and random forests with linear aggregation functions. We introduce a new algorithm that finds the best axis-aligned split to fit linear aggregation functions on the corresponding nodes, and we offer a quasilinear time implementation. We demonstrate the algorithms favorable performance on real-world benchmarks and in an extensive simulation study, and we demonstrate its improved interpretability using a large get-out-the-vote experiment. We provide an open-source software package that implements several tree-based estimators with linear aggregation functions.
A robust estimator is proposed for the parameters that characterize the linear regression problem. It is based on the notion of shrinkages, often used in Finance and previously studied for outlier detection in multivariate data. A thorough simulation study is conducted to investigate: the efficiency with normal and heavy-tailed errors, the robustness under contamination, the computational times, the affine equivariance and breakdown value of the regression estimator. Two classical data-sets often used in the literature and a real socio-economic data-set about the Living Environment Deprivation of areas in Liverpool (UK), are studied. The results from the simulations and the real data examples show the advantages of the proposed robust estimator in regression.
As observed by Auderset et al. (2005) and Wiesel (2012), viewing covariance matrices as elements of a Riemannian manifold and using the concept of geodesic convexity provide useful tools for studying M-estimators of multivariate scatter. In this pape r, we begin with a mathematically rigorous self-contained overview of Riemannian geometry on the space of symmetric positive definite matrices and of the notion of geodesic convexity. The overview contains both a review as well as new results. In particular, we introduce and utilize first and second order Taylor expansions with respect to geodesic parametrizations. This enables us to give sufficient conditions for a function to be geodesically convex. In addition, we introduce the concept of geodesic coercivity, which is important in establishing the existence of a minimum to a geodesic convex function. We also develop a general partial Newton algorithm for minimizing smooth and strictly geodesically convex functions. We then use these results to generate a fairly complete picture of the existence, uniqueness and computation of regularized M-estimators of scatter defined using additive geodescially convex penalty terms. Various such penalties are demonstrated which shrink an estimator towards the identity matrix or multiples of the identity matrix. Finally, we propose a cross-validation method for choosing the scaling parameter for the penalty function, and illustrate our results using a numerical example.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا