No Arabic abstract
In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We propose a class of matrix completion estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to impute the missing elements of the control outcome matrix, corresponding to treated units/periods. This leads to a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to the nuclear norm for matrices. We generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure that is common in social science applications. We present novel insights concerning the connections between the matrix completion literature, the literature on interactive fixed effects models and the literatures on program evaluation under unconfoundedness and synthetic control methods. We show that all these estimators can be viewed as focusing on the same objective function. They differ solely in the way they deal with identification, in some cases solely through regularization (our proposed nuclear norm matrix completion estimator) and in other cases primarily through imposing hard restrictions (the unconfoundedness and synthetic control approaches). The proposed method outperforms unconfoundedness-based or synthetic control estimators in simulations based on real data.
We develop new higher-order asymptotic techniques for the Gaussian maximum likelihood estimator in a spatial panel data model, with fixed effects, time-varying covariates, and spatially correlated errors. Our saddlepoint density and tail area approximation feature relative error of order $O(1/(n(T-1)))$ with $n$ being the cross-sectional dimension and $T$ the time-series dimension. The main theoretical tool is the tilted-Edgeworth technique in a non-identically distributed setting. The density approximation is always non-negative, does not need resampling, and is accurate in the tails. Monte Carlo experiments on density approximation and testing in the presence of nuisance parameters illustrate the good performance of our approximation over first-order asymptotics and Edgeworth expansions. An empirical application to the investment-saving relationship in OECD (Organisation for Economic Co-operation and Development) countries shows disagreement between testing results based on first-order asymptotics and saddlepoint techniques.
We consider a testing problem for cross-sectional dependence for high-dimensional panel data, where the number of cross-sectional units is potentially much larger than the number of observations. The cross-sectional dependence is described through a linear regression model. We study three tests named the sum test, the max test and the max-sum test, where the latter two are new. The sum test is initially proposed by Breusch and Pagan (1980). We design the max and sum tests for sparse and non-sparse residuals in the linear regressions, respectively.And the max-sum test is devised to compromise both situations on the residuals. Indeed, our simulation shows that the max-sum test outperforms the previous two tests. This makes the max-sum test very useful in practice where sparsity or not for a set of data is usually vague. Towards the theoretical analysis of the three tests, we have settled two conjectures regarding the sum of squares of sample correlation coefficients asked by Pesaran (2004 and 2008). In addition, we establish the asymptotic theory for maxima of sample correlations coefficients appeared in the linear regression model for panel data, which is also the first successful attempt to our knowledge. To study the max-sum test, we create a novel method to show asymptotic independence between maxima and sums of dependent random variables. We expect the method itself is useful for other problems of this nature. Finally, an extensive simulation study as well as a case study are carried out. They demonstrate advantages of our proposed methods in terms of both empirical powers and robustness for residuals regardless of sparsity or not.
This study proposes a new Bayesian approach to infer binary treatment effects. The approach treats counterfactual untreated outcomes as missing observations and infers them by completing a matrix composed of realized and potential untreated outcomes using a data augmentation technique. We also develop a tailored prior that helps in the identification of parameters and induces the matrix of untreated outcomes to be approximately low rank. Posterior draws are simulated using a Markov Chain Monte Carlo sampler. While the proposed approach is similar to synthetic control methods and other related methods, it has several notable advantages. First, unlike synthetic control methods, the proposed approach does not require stringent assumptions. Second, in contrast to non-Bayesian approaches, the proposed method can quantify uncertainty about inferences in a straightforward and consistent manner. By means of a series of simulation studies, we show that our proposal has a better finite sample performance than that of the existing approaches.
In this study, we develop a novel estimation method of the quantile treatment effects (QTE) under the rank invariance and rank stationarity assumptions. Ishihara (2020) explores identification of the nonseparable panel data model under these assumptions and propose a parametric estimation based on the minimum distance method. However, the minimum distance estimation using this process is computationally demanding when the dimensionality of covariates is large. To overcome this problem, we propose a two-step estimation method based on the quantile regression and minimum distance method. We then show consistency and asymptotic normality of our estimator. Monte Carlo studies indicate that our estimator performs well in finite samples. Last, we present two empirical illustrations, to estimate the distributional effects of insurance provision on household production and of TV watching on child cognitive development.
Factor structures or interactive effects are convenient devices to incorporate latent variables in panel data models. We consider fixed effect estimation of nonlinear panel single-index models with factor structures in the unobservables, which include logit, probit, ordered probit and Poisson specifications. We establish that fixed effect estimators of model parameters and average partial effects have normal distributions when the two dimensions of the panel grow large, but might suffer of incidental parameter bias. We show how models with factor structures can also be applied to capture important features of network data such as reciprocity, degree heterogeneity, homophily in latent variables and clustering. We illustrate this applicability with an empirical example to the estimation of a gravity equation of international trade between countries using a Poisson model with multiple factors.