No Arabic abstract
Random forests are powerful non-parametric regression method but are severely limited in their usage in the presence of randomly censored observations, and naively applied can exhibit poor predictive performance due to the incurred biases. Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models. Regression adjustment is based on a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring. The proposed procedure named {it censored quantile regression forest}, allows us to estimate quantiles of time-to-event without any parametric modeling assumption. We establish its consistency under mild model specifications. Numerical studies showcase a clear advantage of the proposed procedure.
We propose a censored quantile regression estimator motivated by unbiased estimating equations. Under the usual conditional independence assumption of the survival time and the censoring time given the covariates, we show that the proposed estimator is consistent and asymptotically normal. We develop an efficient computational algorithm which uses existing quantile regression code. As a result, bootstrap-type inference can be efficiently implemented. We illustrate the finite-sample performance of the proposed method by simulation studies and analysis of a survival data set.
With the availability of high dimensional genetic biomarkers, it is of interest to identify heterogeneous effects of these predictors on patients survival, along with proper statistical inference. Censored quantile regression has emerged as a powerful tool for detecting heterogeneous effects of covariates on survival outcomes. To our knowledge, there is little work available to draw inference on the effects of high dimensional predictors for censored quantile regression. This paper proposes a novel procedure to draw inference on all predictors within the framework of global censored quantile regression, which investigates covariate-response associations over an interval of quantile levels, instead of a few discrete values. The proposed estimator combines a sequence of low dimensional model estimates that are based on multi-sample splittings and variable selection. We show that, under some regularity conditions, the estimator is consistent and asymptotically follows a Gaussian process indexed by the quantile level. Simulation studies indicate that our procedure can properly quantify the uncertainty of the estimates in high dimensional settings. We apply our method to analyze the heterogeneous effects of SNPs residing in lung cancer pathways on patients survival, using the Boston Lung Cancer Survival Cohort, a cancer epidemiology study on the molecular mechanism of lung cancer.
China has made great achievements in electric power industry during the long-term deepening of reform and opening up. However, the complex regional economic, social and natural conditions, electricity resources are not evenly distributed, which accounts for the electricity deficiency in some regions of China. It is desirable to develop a robust electricity forecasting model. Motivated by which, we propose a Panel Semiparametric Quantile Regression Neural Network (PSQRNN) by utilizing the artificial neural network and semiparametric quantile regression. The PSQRNN can explore a potential linear and nonlinear relationships among the variables, interpret the unobserved provincial heterogeneity, and maintain the interpretability of parametric models simultaneously. And the PSQRNN is trained by combining the penalized quantile regression with LASSO, ridge regression and backpropagation algorithm. To evaluate the prediction accuracy, an empirical analysis is conducted to analyze the provincial electricity consumption from 1999 to 2018 in China based on three scenarios. From which, one finds that the PSQRNN model performs better for electricity consumption forecasting by considering the economic and climatic factors. Finally, the provincial electricity consumptions of the next $5$ years (2019-2023) in China are reported by forecasting.
We propose a novel method designed for large-scale regression problems, namely the two-stage best-scored random forest (TBRF). Best-scored means to select one regression tree with the best empirical performance out of a certain number of purely random regression tree candidates, and two-stage means to divide the original random tree splitting procedure into two: In stage one, the feature space is partitioned into non-overlapping cells; in stage two, child trees grow separately on these cells. The strengths of this algorithm can be summarized as follows: First of all, the pure randomness in TBRF leads to the almost optimal learning rates, and also makes ensemble learning possible, which resolves the boundary discontinuities long plaguing the existing algorithms. Secondly, the two-stage procedure paves the way for parallel computing, leading to computational efficiency. Last but not least, TBRF can serve as an inclusive framework where different mainstream regression strategies such as linear predictor and least squares support vector machines (LS-SVMs) can also be incorporated as value assignment approaches on leaves of the child trees, depending on the characteristics of the underlying data sets. Numerical assessments on comparisons with other state-of-the-art methods on several large-scale real data sets validate the promising prediction accuracy and high computational efficiency of our algorithm.
Both the median-based classifier and the quantile-based classifier are useful for discriminating high-dimensional data with heavy-tailed or skewed inputs. But these methods are restricted as they assign equal weight to each variable in an unregularized way. The ensemble quantile classifier is a more flexible regularized classifier that provides better performance with high-dimensional data, asymmetric data or when there are many irrelevant extraneous inputs. The improved performance is demonstrated by a simulation study as well as an application to text categorization. It is proven that the estimated parameters of the ensemble quantile classifier consistently estimate the minimal population loss under suitable general model assumptions. It is also shown that the ensemble quantile classifier is Bayes optimal under suitable assumptions with asymmetric Laplace distribution inputs.