No Arabic abstract
We consider a testing problem for cross-sectional dependence for high-dimensional panel data, where the number of cross-sectional units is potentially much larger than the number of observations. The cross-sectional dependence is described through a linear regression model. We study three tests named the sum test, the max test and the max-sum test, where the latter two are new. The sum test is initially proposed by Breusch and Pagan (1980). We design the max and sum tests for sparse and non-sparse residuals in the linear regressions, respectively.And the max-sum test is devised to compromise both situations on the residuals. Indeed, our simulation shows that the max-sum test outperforms the previous two tests. This makes the max-sum test very useful in practice where sparsity or not for a set of data is usually vague. Towards the theoretical analysis of the three tests, we have settled two conjectures regarding the sum of squares of sample correlation coefficients asked by Pesaran (2004 and 2008). In addition, we establish the asymptotic theory for maxima of sample correlations coefficients appeared in the linear regression model for panel data, which is also the first successful attempt to our knowledge. To study the max-sum test, we create a novel method to show asymptotic independence between maxima and sums of dependent random variables. We expect the method itself is useful for other problems of this nature. Finally, an extensive simulation study as well as a case study are carried out. They demonstrate advantages of our proposed methods in terms of both empirical powers and robustness for residuals regardless of sparsity or not.
In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We propose a class of matrix completion estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to impute the missing elements of the control outcome matrix, corresponding to treated units/periods. This leads to a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to the nuclear norm for matrices. We generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure that is common in social science applications. We present novel insights concerning the connections between the matrix completion literature, the literature on interactive fixed effects models and the literatures on program evaluation under unconfoundedness and synthetic control methods. We show that all these estimators can be viewed as focusing on the same objective function. They differ solely in the way they deal with identification, in some cases solely through regularization (our proposed nuclear norm matrix completion estimator) and in other cases primarily through imposing hard restrictions (the unconfoundedness and synthetic control approaches). The proposed method outperforms unconfoundedness-based or synthetic control estimators in simulations based on real data.
We develop new higher-order asymptotic techniques for the Gaussian maximum likelihood estimator in a spatial panel data model, with fixed effects, time-varying covariates, and spatially correlated errors. Our saddlepoint density and tail area approximation feature relative error of order $O(1/(n(T-1)))$ with $n$ being the cross-sectional dimension and $T$ the time-series dimension. The main theoretical tool is the tilted-Edgeworth technique in a non-identically distributed setting. The density approximation is always non-negative, does not need resampling, and is accurate in the tails. Monte Carlo experiments on density approximation and testing in the presence of nuisance parameters illustrate the good performance of our approximation over first-order asymptotics and Edgeworth expansions. An empirical application to the investment-saving relationship in OECD (Organisation for Economic Co-operation and Development) countries shows disagreement between testing results based on first-order asymptotics and saddlepoint techniques.
In a recent paper Juodis and Reese (2021) (JR) show that the application of the CD test proposed by Pesaran (2004) to residuals from panels with latent factors results in over-rejection and propose a randomized test statistic to correct for over-rejection, and add a screening component to achieve power. This paper considers the same problem but from a different perspective and shows that the standard CD test remains valid if the latent factors are weak, and proposes a simple bias-corrected CD test, labelled CD*, which is shown to be asymptotically normal, irrespective of whether the latent factors are weak or strong. This result is shown to hold for pure latent factor models as well as for panel regressions with latent factors. Small sample properties of the CD* test are investigated by Monte Carlo experiments and are shown to have the correct size and satisfactory power for both Gaussian and non-Gaussian errors. In contrast, it is found that JRs test tends to over-reject in the case of panels with non-Gaussian errors, and have low power against spatial network alternatives. The use of the CD* test is illustrated with two empirical applications from the literature.
This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models. High-dimensional models are characterized by having a number of unknown parameters that is not vanishingly small relative to the sample size. We first present results in a framework where estimators of parameters of interest may be represented directly as approximate means. Within this context, we review fundamental results including high-dimensional central limit theorems, bootstrap approximation of high-dimensional limit distributions, and moderate deviation theory. We also review key concepts underlying inference when many parameters are of interest such as multiple testing with family-wise error rate or false discovery rate control. We then turn to a general high-dimensional minimum distance framework with a special focus on generalized method of moments problems where we present results for estimation and inference about model parameters. The presented results cover a wide array of econometric applications, and we discuss several leading special cases including high-dimensional linear regression and linear instrumental variables models to illustrate the general results.
We consider testing statistical hypotheses about densities of signals in deconvolution models. A new approach to this problem is proposed. We constructed score tests for the deconvolution with the known noise density and efficient score tests for the case of unknown density. The tests are incorporated with model selection rules to choose reasonable model dimensions automatically by the data. Consistency of the tests is proved.