No Arabic abstract
l1-norm quantile regression is a common choice if there exists outlier or heavy-tailed error in high-dimensional data sets. However, it is computationally expensive to solve this problem when the feature size of data is ultra high. As far as we know, existing screening rules can not speed up the computation of the l1-norm quantile regression, which dues to the non-differentiability of the quantile function/pinball loss. In this paper, we introduce the dual circumscribed sphere technique and propose a novel l1-norm quantile regression screening rule. Our rule is expressed as the closed-form function of given data and eliminates inactive features with a low computational cost. Numerical experiments on some simulation and real data sets show that this screening rule can be used to eliminate almost all inactive features. Moreover, this rule can help to reduce up to 23 times of computational time, compared with the computation without our screening rule.
The curse of dimensionality is a recognized challenge in nonparametric estimation. This paper develops a new L0-norm regularization approach to the convex quantile and expectile regressions for subset variable selection. We show how to use mixed integer programming to solve the proposed L0-norm regularization approach in practice and build a link to the commonly used L1-norm regularization approach. A Monte Carlo study is performed to compare the finite sample performances of the proposed L0-penalized convex quantile and expectile regression approaches with the L1-norm regularization approaches. The proposed approach is further applied to benchmark the sustainable development performance of the OECD countries and empirically analyze the accuracy in the dimensionality reduction of variables. The results from the simulation and application illustrate that the proposed L0-norm regularization approach can more effectively address the curse of dimensionality than the L1-norm regularization approach in multidimensional spaces.
In this paper, we develop a quantile functional regression modeling framework that models the distribution of a set of common repeated observations from a subject through the quantile function, which is regressed on a set of covariates to determine how these factors affect various aspects of the underlying subject-specific distribution. To account for smoothness in the quantile functions, we introduce custom basis functions we call textit{quantlets} that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set and containing a Gaussian subspace so {non-Gaussianness} can be assessed. While these quantlets could be used within various functional regression frameworks, we build a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and allows fully Bayesian inferences after fitting a Markov chain Monte Carlo. Specifically, we apply global tests to assess which covariates have any effect on the distribution at all, followed by local tests to identify at which specific quantiles the differences lie while adjusting for multiple testing, and to assess whether the covariate affects certain major aspects of the distribution, including location, scale, skewness, Gaussianness, or tails. If the difference lies in these commonly-used summaries, our approach can still detect them, but our systematic modeling strategy can also detect effects on other aspects of the distribution that might be missed if one restricted attention to pre-chosen summaries. We demonstrate the benefit of the basis space modeling through simulation studies, and illustrate the method using a biomedical imaging data set in which we relate the distribution of pixel intensities from a tumor image to various demographic, clinical, and genetic characteristics.
Radiomics involves the study of tumor images to identify quantitative markers explaining cancer heterogeneity. The predominant approach is to extract hundreds to thousands of image features, including histogram features comprised of summaries of the marginal distribution of pixel intensities, which leads to multiple testing problems and can miss out on insights not contained in the selected features. In this paper, we present methods to model the entire marginal distribution of pixel intensities via the quantile function as functional data, regressed on a set of demographic, clinical, and genetic predictors. We call this approach quantile functional regression, regressing subject-specific marginal distributions across repeated measurements on a set of covariates, allowing us to assess which covariates are associated with the distribution in a global sense, as well as to identify distributional features characterizing these differences, including mean, variance, skewness, and various upper and lower quantiles. To account for smoothness in the quantile functions, we introduce custom basis functions we call quantlets that are sparse, regularized, near-lossless, and empirically defined, adapting to the features of a given data set. We fit this model using a Bayesian framework that uses nonlinear shrinkage of quantlet coefficients to regularize the functional regression coefficients and provides fully Bayesian inference after fitting a Markov chain Monte Carlo. We demonstrate the benefit of the basis space modeling through simulation studies, and apply the method to Magnetic resonance imaging (MRI) based radiomic dataset from Glioblastoma Multiforme to relate imaging-based quantile functions to demographic, clinical, and genetic predictors, finding specific differences in tumor pixel intensity distribution between males and females and between tumors with and without DDIT3 mutations.
We propose $ell_1$ norm regularized quadratic surface support vector machine models for binary classification in supervised learning. We establish their desired theoretical properties, including the existence and uniqueness of the optimal solution, reduction to the standard SVMs over (almost) linearly separable data sets, and detection of true sparsity pattern over (almost) quadratically separable data sets if the penalty parameter of $ell_1$ norm is large enough. We also demonstrate their promising practical efficiency by conducting various numerical experiments on both synthetic and publicly available benchmark data sets.
In this paper, we develop a new censored quantile instrumental variable (CQIV) estimator and describe its properties and computation. The CQIV estimator combines Powell (1986) censored quantile regression (CQR) to deal with censoring, with a control variable approach to incorporate endogenous regressors. The CQIV estimator is obtained in two stages that are non-additive in the unobservables. The first stage estimates a non-additive model with infinite dimensional parameters for the control variable, such as a quantile or distribution regression model. The second stage estimates a non-additive censored quantile regression model for the response variable of interest, including the estimated control variable to deal with endogeneity. For computation, we extend the algorithm for CQR developed by Chernozhukov and Hong (2002) to incorporate the estimation of the control variable. We give generic regularity conditions for asymptotic normality of the CQIV estimator and for the validity of resampling methods to approximate its asymptotic distribution. We verify these conditions for quantile and distribution regression estimation of the control variable. Our analysis covers two-stage (uncensored) quantile regression with non-additive first stage as an important special case. We illustrate the computation and applicability of the CQIV estimator with a Monte-Carlo numerical example and an empirical application on estimation of Engel curves for alcohol.