No Arabic abstract
Treatment effects on asymmetric and heavy tailed distributions are better reflected at extreme tails rather than at averages or intermediate quantiles. In such distributions, standard methods for estimating quantile treatment effects can provide misleading inference due to the high variability of the estimators at the extremes. In this work, we propose a novel method which incorporates a heavy tailed component in the outcome distribution to estimate the extreme tails and simultaneously employs quantile regression to model the remainder of the distribution. The threshold between the bulk of the distribution and the extreme tails is estimated by utilising a state of the art technique. Simulation results show the superiority of the proposed method over existing estimators for quantile causal effects at extremes in the case of heavy tailed distributions. The method is applied to analyse a real dataset on the London transport network. In this application, the methodology proposed can assist in effective decision making to improve network performance, where causal inference in the extremes for heavy tailed distributions is often a key aim.
The use of quantiles to obtain insights about multivariate data is addressed. It is argued that incisive insights can be obtained by considering directional quantiles, the quantiles of projections. Directional quantile envelopes are proposed as a way to condense this kind of information; it is demonstrated that they are essentially halfspace (Tukey) depth levels sets, coinciding for elliptic distributions (in particular multivariate normal) with density contours. Relevant questions concerning their indexing, the possibility of the reverse retrieval of directional quantile information, invariance with respect to affine transformations, and approximation/asymptotic properties are studied. It is argued that the analysis in terms of directional quantiles and their envelopes offers a straightforward probabilistic interpretation and thus conveys a concrete quantitative meaning; the directional definition can be adapted to elaborate frameworks, like estimation of extreme quantiles and directional quantile regression, the regression of depth contours on covariates. The latter facilitates the construction of multivariate growth charts---the question that motivated all the development.
The selection of grouped variables using the random forest algorithm is considered. First a new importance measure adapted for groups of variables is proposed. Theoretical insights into this criterion are given for additive regression models. Second, an original method for selecting functional variables based on the grouped variable importance measure is developed. Using a wavelet basis, it is proposed to regroup all of the wavelet coefficients for a given functional variable and use a wrapper selection algorithm with these groups. Various other groupings which take advantage of the frequency and time localization of the wavelet basis are proposed. An extensive simulation study is performed to illustrate the use of the grouped importance measure in this context. The method is applied to a real life problem coming from aviation safety.
Transport operators have a range of intervention options available to improve or enhance their networks. Such interventions are often made in the absence of sound evidence on resulting outcomes. Cycling superhighways were promoted as a sustainable and healthy travel mode, one of the aims of which was to reduce traffic congestion. Estimating the impacts that cycle superhighways have on congestion is complicated due to the non-random assignment of such intervention over the transport network. In this paper, we analyse the causal effect of cycle superhighways utilising pre-intervention and post-intervention information on traffic and road characteristics along with socio-economic factors. We propose a modeling framework based on the propensity score and outcome regression model. The method is also extended to the doubly robust set-up. Simulation results show the superiority of the performance of the proposed method over existing competitors. The method is applied to analyse a real dataset on the London transport network. The methodology proposed can assist in effective decision making to improve network performance.
This study proposes a new Bayesian approach to infer binary treatment effects. The approach treats counterfactual untreated outcomes as missing observations and infers them by completing a matrix composed of realized and potential untreated outcomes using a data augmentation technique. We also develop a tailored prior that helps in the identification of parameters and induces the matrix of untreated outcomes to be approximately low rank. Posterior draws are simulated using a Markov Chain Monte Carlo sampler. While the proposed approach is similar to synthetic control methods and other related methods, it has several notable advantages. First, unlike synthetic control methods, the proposed approach does not require stringent assumptions. Second, in contrast to non-Bayesian approaches, the proposed method can quantify uncertainty about inferences in a straightforward and consistent manner. By means of a series of simulation studies, we show that our proposal has a better finite sample performance than that of the existing approaches.
This paper proposes a maximum-likelihood approach to jointly estimate marginal conditional quantiles of multivariate response variables in a linear regression framework. We consider a slight reparameterization of the Multivariate Asymmetric Laplace distribution proposed by Kotz et al (2001) and exploit its location-scale mixture representation to implement a new EM algorithm for estimating model parameters. The idea is to extend the link between the Asymmetric Laplace distribution and the well-known univariate quantile regression model to a multivariate context, i.e. when a multivariate dependent variable is concerned. The approach accounts for association among multiple responses and study how the relationship between responses and explanatory variables can vary across different quantiles of the marginal conditional distribution of the responses. A penalized version of the EM algorithm is also presented to tackle the problem of variable selection. The validity of our approach is analyzed in a simulation study, where we also provide evidence on the efficiency gain of the proposed method compared to estimation obtained by separate univariate quantile regressions. A real data application is finally proposed to study the main determinants of financial distress in a sample of Italian firms.