No Arabic abstract
The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogeneous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects -- a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize the range of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. They also serve as a basis for classification analysis, where we divide the observational units into most or least affected groups and summarize their characteristics. We provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects and related classification analysis, and provide confidence sets for the most and least affected groups. The derived statistical results rely on establishing key, new mathematical results on Hadamard differentiability of a multivariate sorting operator and a related classification operator, which are of independent interest. We apply the sorted effects method and classification analysis to demonstrate several striking patterns in the gender wage gap.
In the recent literature on estimating heterogeneous treatment effects, each proposed method makes its own set of restrictive assumptions about the interventions effects and which subpopulations to explicitly estimate. Moreover, the majority of the literature provides no mechanism to identify which subpopulations are the most affected--beyond manual inspection--and provides little guarantee on the correctness of the identified subpopulations. Therefore, we propose Treatment Effect Subset Scan (TESS), a new method for discovering which subpopulation in a randomized experiment is most significantly affected by a treatment. We frame this challenge as a pattern detection problem where we efficiently maximize a nonparametric scan statistic over subpopulations. Furthermore, we identify the subpopulation which experiences the largest distributional change as a result of the intervention, while making minimal assumptions about the interventions effects or the underlying data generating process. In addition to the algorithm, we demonstrate that the asymptotic Type I and II error can be controlled, and provide sufficient conditions for detection consistency--i.e., exact identification of the affected subpopulation. Finally, we validate the efficacy of the method by discovering heterogeneous treatment effects in simulations and in real-world data from a well-known program evaluation study.
We present an approach to estimate distance-dependent heterogeneous associations between point-referenced exposures to built environment characteristics and health outcomes. By estimating associations that depend non-linearly on distance between subjects and point-referenced exposures, this method addresses the modifiable area-unit problem that is pervasive in the built environment literature. Additionally, by estimating heterogeneous effects, the method also addresses the uncertain geographic context problem. The key innovation of our method is to combine ideas from the non-parametric function estimation literature and the Bayesian Dirichlet process literature. The former is used to estimate nonlinear associations between subjects outcomes and proximate built environment features, and the latter identifies clusters within the population that have different effects. We study this method in simulations and apply our model to study heterogeneity in the association between fast food restaurant availability and weight status of children attending schools in Los Angeles, California.
Nonseparable panel models are important in a variety of economic settings, including discrete choice. This paper gives identification and estimation results for nonseparable models under time homogeneity conditions that are like time is randomly assigned or time is an instrument. Partial identification results for average and quantile effects are given for discrete regressors, under static or dynamic conditions, in fully nonparametric and in semiparametric models, with time effects. It is shown that the usual, linear, fixed-effects estimator is not a consistent estimator of the identified average effect, and a consistent estimator is given. A simple estimator of identified quantile treatment effects is given, providing a solution to the important problem of estimating quantile treatment effects from panel data. Bounds for overall effects in static and dynamic models are given. The dynamic bounds provide a partial identification solution to the important problem of estimating the effect of state dependence in the presence of unobserved heterogeneity. The impact of $T$, the number of time periods, is shown by deriving shrinkage rates for the identified set as $T$ grows. We also consider semiparametric, discrete-choice models and find that semiparametric panel bounds can be much tighter than nonparametric bounds. Computationally-convenient methods for semiparametric models are presented. We propose a novel inference method that applies in panel data and other settings and show that it produces uniformly valid confidence regions in large samples. We give empirical illustrations.
We develop new semiparametric methods for estimating treatment effects. We focus on a setting where the outcome distributions may be thick tailed, where treatment effects are small, where sample sizes are large and where assignment is completely random. This setting is of particular interest in recent experimentation in tech companies. We propose using parametric models for the treatment effects, as opposed to parametric models for the full outcome distributions. This leads to semiparametric models for the outcome distributions. We derive the semiparametric efficiency bound for this setting, and propose efficient estimators. In the case with a constant treatment effect one of the proposed estimators has an interesting interpretation as a weighted average of quantile treatment effects, with the weights proportional to (minus) the second derivative of the log of the density of the potential outcomes. Our analysis also results in an extension of Hubers model and trimmed mean to include asymmetry and a simplified condition on linear combinations of order statistics, which may be of independent interest.
We derive fixed effects estimators of parameters and average partial effects in (possibly dynamic) nonlinear panel data models with individual and time effects. They cover logit, probit, ordered probit, Poisson and Tobit models that are important for many empirical applications in micro and macroeconomics. Our estimators use analytical and jackknife bias corrections to deal with the incidental parameter problem, and are asymptotically unbiased under asymptotic sequences where $N/T$ converges to a constant. We develop inference methods and show that they perform well in numerical examples.