No Arabic abstract
We study identification and estimation of causal effects in settings with panel data. Traditionally researchers follow model-based identification strategies relying on assumptions governing the relation between the potential outcomes and the unobserved confounders. We focus on a novel, complementary, approach to identification where assumptions are made about the relation between the treatment assignment and the unobserved confounders. We introduce different sets of assumptions that follow the two paths to identification, and develop a double robust approach. We propose estimation methods that build on these identification strategies.
We propose a new estimator for the average causal effects of a binary treatment with panel data in settings with general treatment patterns. Our approach augments the two-way-fixed-effects specification with the unit-specific weights that arise from a model for the assignment mechanism. We show how to construct these weights in various settings, including situations where units opt into the treatment sequentially. The resulting estimator converges to an average (over units and time) treatment effect under the correct specification of the assignment model. We show that our estimator is more robust than the conventional two-way estimator: it remains consistent if either the assignment mechanism or the two-way regression model is correctly specified and performs better than the two-way-fixed-effect estimator if both are locally misspecified. This strong double robustness property quantifies the benefits from modeling the assignment process and motivates using our estimator in practice.
We study the impact of weak identification in discrete choice models, and provide insights into the determinants of identification strength in these models. Using these insights, we propose a novel test that can consistently detect weak identification in commonly applied discrete choice models, such as probit, logit, and many of their extensions. Furthermore, we demonstrate that when the null hypothesis of weak identification is rejected, Wald-based inference can be carried out using standard formulas and critical values. A Monte Carlo study compares our proposed testing approach against commonly applied weak identification tests. The results simultaneously demonstrate the good performance of our approach and the fundamental failure of using conventional weak identification tests for linear models in the discrete choice model context. Furthermore, we compare our approach against those commonly applied in the literature in two empirical examples: married women labor force participation, and US food aid and civil conflicts.
We use identification robust tests to show that difference, level and non-linear moment conditions, as proposed by Arellano and Bond (1991), Arellano and Bover (1995), Blundell and Bond (1998) and Ahn and Schmidt (1995) for the linear dynamic panel data model, do not separately identify the autoregressive parameter when its true value is close to one and the variance of the initial observations is large. We prove that combinations of these moment conditions, however, do so when there are more than three time series observations. This identification then solely results from a set of, so-called, robust moment conditions. These robust moments are spanned by the combined difference, level and non-linear moment conditions and only depend on differenced data. We show that, when only the robust moments contain identifying information on the autoregressive parameter, the discriminatory power of the Kleibergen (2005) LM test using the combined moments is identical to the largest rejection frequencies that can be obtained from solely using the robust moments. This shows that the KLM test implicitly uses the robust moments when only they contain information on the autoregressive parameter.
This paper studies a panel data setting where the goal is to estimate causal effects of an intervention by predicting the counterfactual values of outcomes for treated units, had they not received the treatment. Several approaches have been proposed for this problem, including regression methods, synthetic control methods and matrix completion methods. This paper considers an ensemble approach, and shows that it performs better than any of the individual methods in several economic datasets. Matrix completion methods are often given the most weight by the ensemble, but this clearly depends on the setting. We argue that ensemble methods present a fruitful direction for further research in the causal panel data setting.
The lack of longitudinal studies of the relationship between the built environment and travel behavior has been widely discussed in the literature. This paper discusses how standard propensity score matching estimators can be extended to enable such studies by pairing observations across two dimensions: longitudinal and cross-sectional. Researchers mimic randomized controlled trials (RCTs) and match observations in both dimensions, to find synthetic control groups that are similar to the treatment group and to match subjects synthetically across before-treatment and after-treatment time periods. We call this a two-dimensional propensity score matching (2DPSM). This method demonstrates superior performance for estimating treatment effects based on Monte Carlo evidence. A near-term opportunity for such matching is identifying the impact of transportation infrastructure on travel behavior.