No Arabic abstract
In this paper, we consider a high-dimensional quantile regression model where the sparsity structure may differ between two sub-populations. We develop $ell_1$-penalized estimators of both regression coefficients and the threshold parameter. Our penalized estimators not only select covariates but also discriminate between a model with homogeneous sparsity and a model with a change point. As a result, it is not necessary to know or pretest whether the change point is present, or where it occurs. Our estimator of the change point achieves an oracle property in the sense that its asymptotic distribution is the same as if the unknown active sets of regression coefficients were known. Importantly, we establish this oracle property without a perfect covariate selection, thereby avoiding the need for the minimum level condition on the signals of active covariates. Dealing with high-dimensional quantile regression with an unknown change point calls for a new proof technique since the quantile loss function is non-smooth and furthermore the corresponding objective function is non-convex with respect to the change point. The technique developed in this paper is applicable to a general M-estimation framework with a change point, which may be of independent interest. The proposed methods are then illustrated via Monte Carlo experiments and an application to tipping in the dynamics of racial segregation.
In this paper, we develop a new censored quantile instrumental variable (CQIV) estimator and describe its properties and computation. The CQIV estimator combines Powell (1986) censored quantile regression (CQR) to deal with censoring, with a control variable approach to incorporate endogenous regressors. The CQIV estimator is obtained in two stages that are non-additive in the unobservables. The first stage estimates a non-additive model with infinite dimensional parameters for the control variable, such as a quantile or distribution regression model. The second stage estimates a non-additive censored quantile regression model for the response variable of interest, including the estimated control variable to deal with endogeneity. For computation, we extend the algorithm for CQR developed by Chernozhukov and Hong (2002) to incorporate the estimation of the control variable. We give generic regularity conditions for asymptotic normality of the CQIV estimator and for the validity of resampling methods to approximate its asymptotic distribution. We verify these conditions for quantile and distribution regression estimation of the control variable. Our analysis covers two-stage (uncensored) quantile regression with non-additive first stage as an important special case. We illustrate the computation and applicability of the CQIV estimator with a Monte-Carlo numerical example and an empirical application on estimation of Engel curves for alcohol.
With the availability of high dimensional genetic biomarkers, it is of interest to identify heterogeneous effects of these predictors on patients survival, along with proper statistical inference. Censored quantile regression has emerged as a powerful tool for detecting heterogeneous effects of covariates on survival outcomes. To our knowledge, there is little work available to draw inference on the effects of high dimensional predictors for censored quantile regression. This paper proposes a novel procedure to draw inference on all predictors within the framework of global censored quantile regression, which investigates covariate-response associations over an interval of quantile levels, instead of a few discrete values. The proposed estimator combines a sequence of low dimensional model estimates that are based on multi-sample splittings and variable selection. We show that, under some regularity conditions, the estimator is consistent and asymptotically follows a Gaussian process indexed by the quantile level. Simulation studies indicate that our procedure can properly quantify the uncertainty of the estimates in high dimensional settings. We apply our method to analyze the heterogeneous effects of SNPs residing in lung cancer pathways on patients survival, using the Boston Lung Cancer Survival Cohort, a cancer epidemiology study on the molecular mechanism of lung cancer.
In this study, we develop a novel estimation method of the quantile treatment effects (QTE) under the rank invariance and rank stationarity assumptions. Ishihara (2020) explores identification of the nonseparable panel data model under these assumptions and propose a parametric estimation based on the minimum distance method. However, the minimum distance estimation using this process is computationally demanding when the dimensionality of covariates is large. To overcome this problem, we propose a two-step estimation method based on the quantile regression and minimum distance method. We then show consistency and asymptotic normality of our estimator. Monte Carlo studies indicate that our estimator performs well in finite samples. Last, we present two empirical illustrations, to estimate the distributional effects of insurance provision on household production and of TV watching on child cognitive development.
In ordinary quantile regression, quantiles of different order are estimated one at a time. An alternative approach, which is referred to as quantile regression coefficients modeling (QRCM), is to model quantile regression coefficients as parametric functions of the order of the quantile. In this paper, we describe how the QRCM paradigm can be applied to longitudinal data. We introduce a two-level quantile function, in which two different quantile regression models are used to describe the (conditional) distribution of the within-subject response and that of the individual effects. We propose a novel type of penalized fixed-effects estimator, and discuss its advantages over standard methods based on $ell_1$ and $ell_2$ penalization. We provide model identifiability conditions, derive asymptotic properties, describe goodness-of-fit measures and model selection criteria, present simulation results, and discuss an application. The proposed method has been implemented in the R package qrcm.
This paper develops a novel spatial quantile function-on-scalar regression model, which studies the conditional spatial distribution of a high-dimensional functional response given scalar predictors. With the strength of both quantile regression and copula modeling, we are able to explicitly characterize the conditional distribution of the functional or image response on the whole spatial domain. Our method provides a comprehensive understanding of the effect of scalar covariates on functional responses across different quantile levels and also gives a practical way to generate new images for given covariate values. Theoretically, we establish the minimax rates of convergence for estimating coefficient functions under both fixed and random designs. We further develop an efficient primal-dual algorithm to handle high-dimensional image data. Simulations and real data analysis are conducted to examine the finite-sample performance.