No Arabic abstract
This paper develops a novel spatial quantile function-on-scalar regression model, which studies the conditional spatial distribution of a high-dimensional functional response given scalar predictors. With the strength of both quantile regression and copula modeling, we are able to explicitly characterize the conditional distribution of the functional or image response on the whole spatial domain. Our method provides a comprehensive understanding of the effect of scalar covariates on functional responses across different quantile levels and also gives a practical way to generate new images for given covariate values. Theoretically, we establish the minimax rates of convergence for estimating coefficient functions under both fixed and random designs. We further develop an efficient primal-dual algorithm to handle high-dimensional image data. Simulations and real data analysis are conducted to examine the finite-sample performance.
In this paper, a functional partial quantile regression approach, a quantile regression analog of the functional partial least squares regression, is proposed to estimate the function-on-function linear quantile regression model. A partial quantile covariance function is first used to extract the functional partial quantile regression basis functions. The extracted basis functions are then used to obtain the functional partial quantile regression components and estimate the final model. In our proposal, the functional forms of the discretely observed random variables are first constructed via a finite-dimensional basis function expansion method. The functional partial quantile regression constructed using the functional random variables is approximated via the partial quantile regression constructed using the basis expansion coefficients. The proposed method uses an iterative procedure to extract the partial quantile regression components. A Bayesian information criterion is used to determine the optimum number of retained components. The proposed functional partial quantile regression model allows for more than one functional predictor in the model. However, the true form of the proposed model is unspecified, as the relevant predictors for the model are unknown in practice. Thus, a forward variable selection procedure is used to determine the significant predictors for the proposed model. Moreover, a case-sampling-based bootstrap procedure is used to construct pointwise prediction intervals for the functional response. The predictive performance of the proposed method is evaluated using several Monte Carlo experiments under different data generation processes and error distributions. Through an empirical data example, air quality data are analyzed to demonstrate the effectiveness of the proposed method.
With the availability of high dimensional genetic biomarkers, it is of interest to identify heterogeneous effects of these predictors on patients survival, along with proper statistical inference. Censored quantile regression has emerged as a powerful tool for detecting heterogeneous effects of covariates on survival outcomes. To our knowledge, there is little work available to draw inference on the effects of high dimensional predictors for censored quantile regression. This paper proposes a novel procedure to draw inference on all predictors within the framework of global censored quantile regression, which investigates covariate-response associations over an interval of quantile levels, instead of a few discrete values. The proposed estimator combines a sequence of low dimensional model estimates that are based on multi-sample splittings and variable selection. We show that, under some regularity conditions, the estimator is consistent and asymptotically follows a Gaussian process indexed by the quantile level. Simulation studies indicate that our procedure can properly quantify the uncertainty of the estimates in high dimensional settings. We apply our method to analyze the heterogeneous effects of SNPs residing in lung cancer pathways on patients survival, using the Boston Lung Cancer Survival Cohort, a cancer epidemiology study on the molecular mechanism of lung cancer.
Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high-dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically pre-specified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high-dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal
$ell_1$-penalized quantile regression is widely used for analyzing high-dimensional data with heterogeneity. It is now recognized that the $ell_1$-penalty introduces non-negligible estimation bias, while a proper use of concave regularization may lead to estimators with refined convergence rates and oracle properties as the signal strengthens. Although folded concave penalized $M$-estimation with strongly convex loss functions have been well studied, the extant literature on quantile regression is relatively silent. The main difficulty is that the quantile loss is piecewise linear: it is non-smooth and has curvature concentrated at a single point. To overcome the lack of smoothness and strong convexity, we propose and study a convolution-type smoothed quantile regression with iteratively reweighted $ell_1$-regularization. The resulting smoothed empirical loss is twice continuously differentiable and (provably) locally strongly convex with high probability. We show that the iteratively reweighted $ell_1$-penalized smoothed quantile regression estimator, after a few iterations, achieves the optimal rate of convergence, and moreover, the oracle rate and the strong oracle property under an almost necessary and sufficient minimum signal strength condition. Extensive numerical studies corroborate our theoretical results.
We develop a fully Bayesian framework for function-on-scalars regression with many predictors. The functional data response is modeled nonparametrically using unknown basis functions, which produces a flexible and data-adaptive functional basis. We incorporate shrinkage priors that effectively remove unimportant scalar covariates from the model and reduce sensitivity to the number of (unknown) basis functions. For variable selection in functional regression, we propose a decision theoretic posterior summarization technique, which identifies a subset of covariates that retains nearly the predictive accuracy of the full model. Our approach is broadly applicable for Bayesian functional regression models, and unlike existing methods provides joint rather than marginal selection of important predictor variables. Computationally scalable posterior inference is achieved using a Gibbs sampler with linear time complexity in the number of predictors. The resulting algorithm is empirically faster than existing frequentist and Bayesian techniques, and provides joint estimation of model parameters, prediction and imputation of functional trajectories, and uncertainty quantification via the posterior distribution. A simulation study demonstrates improvements in estimation accuracy, uncertainty quantification, and variable selection relative to existing alternatives. The methodology is applied to actigraphy data to investigate the association between intraday physical activity and responses to a sleep questionnaire.