No Arabic abstract
The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile indexes. The first algorithm applies the preprocessing idea of Portnoy and Koenker (1997) but exploits a previously estimated quantile regression to guess the sign of the residuals. This step allows for a reduction of the effective sample size. The second algorithm starts from a previously estimated quantile regression at a similar quantile index and updates it using a single Newton-Raphson iteration. The first algorithm is exact, while the second is only asymptotically equivalent to the traditional quantile regression estimator. We also apply the preprocessing idea to the bootstrap by using the sample estimates to guess the sign of the residuals in the bootstrap sample. Simulations show that our new algorithms provide very large improvements in computation time without significant (if any) cost in the quality of the estimates. For instance, we divide by 100 the time required to estimate 99 quantile regressions with 20 regressors and 50,000 observations.
We propose a generalization of the linear panel quantile regression model to accommodate both textit{sparse} and textit{dense} parts: sparse means while the number of covariates available is large, potentially only a much smaller number of them have a nonzero impact on each conditional quantile of the response variable; while the dense part is represent by a low-rank matrix that can be approximated by latent factors and their loadings. Such a structure poses problems for traditional sparse estimators, such as the $ell_1$-penalised Quantile Regression, and for traditional latent factor estimator, such as PCA. We propose a new estimation procedure, based on the ADMM algorithm, consists of combining the quantile loss function with $ell_1$ textit{and} nuclear norm regularization. We show, under general conditions, that our estimator can consistently estimate both the nonzero coefficients of the covariates and the latent low-rank matrix. Our proposed model has a Characteristics + Latent Factors Asset Pricing Model interpretation: we apply our model and estimator with a large-dimensional panel of financial data and find that (i) characteristics have sparser predictive power once latent factors were controlled (ii) the factors and coefficients at upper and lower quantiles are different from the median.
The aim of this article is to present a novel parallelization method for temporal Gaussian process (GP) regression problems. The method allows for solving GP regression problems in logarithmic O(log N) time, where N is the number of time steps. Our approach uses the state-space representation of GPs which in its original form allows for linear O(N) time GP regression by leveraging the Kalman filtering and smoothing methods. By using a recently proposed parallelization method for Bayesian filters and smoothers, we are able to reduce the linear computational complexity of the temporal GP regression problems into logarithmic span complexity. This ensures logarithmic time complexity when run on parallel hardware such as a graphics processing unit (GPU). We experimentally demonstrate the computational benefits on simulated and real datasets via our open-source implementation leveraging the GPflow framework.
Credible counterfactual analysis requires high-dimensional controls. This paper considers estimation and inference for heterogeneous counterfactual effects with high-dimensional data. We propose a novel doubly robust score for double/debiased estimation and inference for the unconditional quantile regression (Firpo, Fortin, and Lemieux, 2009) as a measure of heterogeneous counterfactual marginal effects. We propose a multiplier bootstrap inference for the Lasso double/debiased estimator, and develop asymptotic theories to guarantee that the bootstrap works. Simulation studies support our theories. Applying the proposed method to Job Corps survey data, we find that i) marginal effects of counterfactually extending the duration of the exposure to the Job Corps program are globally positive across quantiles regardless of definitions of the treatment and outcome variables, and that ii) these counterfactual effects are larger for higher potential earners than lower potential earners regardless of whether we define the outcome as the level or its logarithm.
This paper provides a method to construct simultaneous confidence bands for quantile functions and quantile effects in nonlinear network and panel models with unobserved two-way effects, strictly exogenous covariates, and possibly discrete outcome variables. The method is based upon projection of simultaneous confidence bands for distribution functions constructed from fixed effects distribution regression estimators. These fixed effects estimators are debiased to deal with the incidental parameter problem. Under asymptotic sequences where both dimensions of the data set grow at the same rate, the confidence bands for the quantile functions and effects have correct joint coverage in large samples. An empirical application to gravity models of trade illustrates the applicability of the methods to network data.
Datasets from field experiments with covariate-adaptive randomizations (CARs) usually contain extra baseline covariates in addition to the strata indicators. We propose to incorporate these extra covariates via auxiliary regressions in the estimation and inference of unconditional QTEs under CARs. We establish the consistency, limiting distribution, and validity of the multiplier bootstrap of the regression-adjusted QTE estimator. The auxiliary regression may be estimated parametrically, nonparametrically, or via regularization when the data are high-dimensional. Even when the auxiliary regression is misspecified, the proposed bootstrap inferential procedure still achieves the nominal rejection probability in the limit under the null. When the auxiliary regression is correctly specified, the regression-adjusted estimator achieves the minimum asymptotic variance. We also derive the optimal pseudo true values for the potentially misspecified parametric model that minimize the asymptotic variance of the corresponding QTE estimator. We demonstrate the finite sample performance of the new estimation and inferential methods using simulations and provide an empirical application to a well-known dataset in education.