ترغب بنشر مسار تعليمي؟ اضغط هنا

Quantile Based Variable Mining : Detection, FDR based Extraction and Interpretation

205   0   0.0 ( 0 )
 نشر من قبل Subhadeep Mukhopadhyay
 تاريخ النشر 2011
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper outlines a unified framework for high dimensional variable selection for classification problems. Traditional approaches to finding interesting variables mostly utilize only partial information through moments (like mean difference). On the contrary, in this paper we address the question of variable selection in full generality from a distributional point of view. If a variable is not important for classification, then it will have similar distributional aspect under different classes. This simple and straightforward observation motivates us to quantify `How and Why the distribution of a variable changes over classes through CR-statistic. The second contribution of our paper is to develop and investigate the FDR based thresholding technology from a completely new point of view for adaptive thresholding, which leads to a elegant algorithm called CDfdr. This paper attempts to show how all of these problems of detection, extraction and interpretation for interesting variables can be treated in a unified way under one broad general theme - comparison analysis. It is proposed that a key to accomplishing this unification is to think in terms of the quantile function and the comparison density. We illustrate and demonstrate the power of our methodology using three real data sets.



قيم البحث

اقرأ أيضاً

Quantile regression, that is the prediction of conditional quantiles, has steadily gained importance in statistical modeling and financial applications. The authors introduce a new semiparametric quantile regression method based on sequentially fitti ng a likelihood optimal D-vine copula to given data resulting in highly flexible models with easily extractable conditional quantiles. As a subclass of regular vine copulas, D-vines enable the modeling of multivariate copulas in terms of bivariate building blocks, a so-called pair-copula construction (PCC). The proposed algorithm works fast and accurate even in high dimensions and incorporates an automatic variable selection by maximizing the conditional log-likelihood. Further, typical issues of quantile regression such as quantile crossing or transformations, interactions and collinearity of variables are automatically taken care of. In a simulation study the improved accuracy and saved computational time of the approach in comparison with established quantile regression methods is highlighted. An extensive financial application to international credit default swap (CDS) data including stress testing and Value-at-Risk (VaR) prediction demonstrates the usefulness of the proposed method.
104 - Takuya Ura 2016
This paper considers the instrumental variable quantile regression model (Chernozhukov and Hansen, 2005, 2013) with a binary endogenous treatment. It offers two identification results when the treatment status is not directly observed. The first resu lt is that, remarkably, the reduced-form quantile regression of the outcome variable on the instrumental variable provides a lower bound on the structural quantile treatment effect under the stochastic monotonicity condition (Small and Tan, 2007; DiNardo and Lee, 2011). This result is relevant, not only when the treatment variable is subject to misclassification, but also when any measurement of the treatment variable is not available. The second result is for the structural quantile function when the treatment status is measured with error; I obtain the sharp identified set by deriving moment conditions under widely-used assumptions on the measurement error. Furthermore, I propose an inference method in the presence of other covariates.
This article proposes a Bayesian approach to estimating the spectral density of a stationary time series using a prior based on a mixture of P-spline distributions. Our proposal is motivated by the B-spline Dirichlet process prior of Edwards et al. ( 2019) in combination with Whittles likelihood and aims at reducing the high computational complexity of its posterior computations. The strength of the B-spline Dirichlet process prior over the Bernstein-Dirichlet process prior of Choudhuri et al. (2004) lies in its ability to estimate spectral densities with sharp peaks and abrupt changes due to the flexibility of B-splines with variable number and location of knots. Here, we suggest to use P-splines of Eilers and Marx (1996) that combine a B-spline basis with a discrete penalty on the basis coefficients. In addition to equidistant knots, a novel strategy for a more expedient placement of knots is proposed that makes use of the information provided by the periodogram about the steepness of the spectral power distribution. We demonstrate in a simulation study and two real case studies that this approach retains the flexibility of the B-splines, achieves similar ability to accurately estimate peaks due to the new data-driven knot allocation scheme but significantly reduces the computational costs.
In this paper, we consider Bayesian variable selection problem of linear regression model with global-local shrinkage priors on the regression coefficients. We propose a variable selection procedure that select a variable if the ratio of the posterio r mean to the ordinary least square estimate of the corresponding coefficient is greater than $1/2$. Under the assumption of orthogonal designs, we show that if the local parameters have polynomial-tailed priors, our proposed method enjoys the oracle property in the sense that it can achieve variable selection consistency and optimal estimation rate at the same time. However, if, instead, an exponential-tailed prior is used for the local parameters, the proposed method does not have the oracle property.
254 - Lizhen Nie , Dan L. Nicolae 2021
We consider the detection and localization of change points in the distribution of an offline sequence of observations. Based on a nonparametric framework that uses a similarity graph among observations, we propose new test statistics when at most on e change point occurs and generalize them to multiple change points settings. The proposed statistics leverage edge weight information in the graphs, exhibiting substantial improvements in testing power and localization accuracy in simulations. We derive the null limiting distribution, provide accurate analytic approximations to control type I error, and establish theoretical guarantees on the power consistency under contiguous alternatives for the one change point setting, as well as the minimax localization rate. In the multiple change points setting, the asymptotic correctness of the number and location of change points are also guaranteed. The methods are illustrated on the MIT proximity network data.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا