No Arabic abstract
This paper studies the estimation of network connectedness with focally sparse structure. We try to uncover the network effect with a flexible sparse deviation from a predetermined adjacency matrix. To be more specific, the sparse deviation structure can be regarded as latent or misspecified linkages. To obtain high-quality estimator for parameters of interest, we propose to use a double regularized high-dimensional generalized method of moments (GMM) framework. Moreover, this framework also facilitates us to conduct the inference. Theoretical results on consistency and asymptotic normality are provided with accounting for general spatial and temporal dependency of the underlying data generating processes. Simulations demonstrate good performance of our proposed procedure. Finally, we apply the methodology to study the spatial network effect of stock returns.
Factor and sparse models are two widely used methods to impose a low-dimensional structure in high-dimension. They are seemingly mutually exclusive. We propose a lifting method that combines the merits of these two models in a supervised learning methodology that allows to efficiently explore all the information in high-dimensional datasets. The method is based on a flexible model for high-dimensional panel data, called factor-augmented regression (FarmPredict) model with both observable or latent common factors, as well as idiosyncratic components. This model not only includes both principal component (factor) regression and sparse regression as specific models but also significantly weakens the cross-sectional dependence and hence facilitates model selection and interpretability. The methodology consists of three steps. At each step, the remaining cross-section dependence can be inferred by a novel test for covariance structure in high-dimensions. We developed asymptotic theory for the FarmPredict model and demonstrated the validity of the multiplier bootstrap for testing high-dimensional covariance structure. This is further extended to testing high-dimensional partial covariance structures. The theory is supported by a simulation study and applications to the construction of a partial covariance network of the financial returns and a prediction exercise for a large panel of macroeconomic time series from FRED-MD database.
This paper proposes a logistic undirected network formation model which allows for assortative matching on observed individual characteristics and the presence of edge-wise fixed effects. We model the coefficients of observed characteristics to have a latent community structure and the edge-wise fixed effects to be of low rank. We propose a multi-step estimation procedure involving nuclear norm regularization, sample splitting, iterative logistic regression and spectral clustering to detect the latent communities. We show that the latent communities can be exactly recovered when the expected degree of the network is of order log n or higher, where n is the number of nodes in the network. The finite sample performance of the new estimation and inference methods is illustrated through both simulated and real datasets.
This paper provides a method to construct simultaneous confidence bands for quantile functions and quantile effects in nonlinear network and panel models with unobserved two-way effects, strictly exogenous covariates, and possibly discrete outcome variables. The method is based upon projection of simultaneous confidence bands for distribution functions constructed from fixed effects distribution regression estimators. These fixed effects estimators are debiased to deal with the incidental parameter problem. Under asymptotic sequences where both dimensions of the data set grow at the same rate, the confidence bands for the quantile functions and effects have correct joint coverage in large samples. An empirical application to gravity models of trade illustrates the applicability of the methods to network data.
We develop a distribution regression model under endogenous sample selection. This model is a semiparametric generalization of the Heckman selection model that accommodates much richer patterns of heterogeneity in the selection process and effect of the covariates. The model applies to continuous, discrete and mixed outcomes. We study the identification of the model, and develop a computationally attractive two-step method to estimate the model parameters, where the first step is a probit regression for the selection equation and the second step consists of multiple distribution regressions with selection corrections for the outcome equation. We construct estimators of functionals of interest such as actual and counterfactual distributions of latent and observed outcomes via plug-in rule. We derive functional central limit theorems for all the estimators and show the validity of multiplier bootstrap to carry out functional inference. We apply the methods to wage decompositions in the UK using new data. Here we decompose the difference between the male and female wage distributions into four effects: composition, wage structure, selection structure and selection sorting. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21% to 40% throughout the (latent) offered wage distribution that is not explained by observable labor market characteristics. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution. These findings can be interpreted as evidence of assortative matching in the marriage market and glass-ceiling in the labor market.
We propose a two-stage least squares (2SLS) estimator whose first stage is the equal-weighted average over a complete subset with $k$ instruments among $K$ available, which we call the complete subset averaging (CSA) 2SLS. The approximate mean squared error (MSE) is derived as a function of the subset size $k$ by the Nagar (1959) expansion. The subset size is chosen by minimizing the sample counterpart of the approximate MSE. We show that this method achieves the asymptotic optimality among the class of estimators with different subset sizes. To deal with averaging over a growing set of irrelevant instruments, we generalize the approximate MSE to find that the optimal $k$ is larger than otherwise. An extensive simulation experiment shows that the CSA-2SLS estimator outperforms the alternative estimators when instruments are correlated. As an empirical illustration, we estimate the logistic demand function in Berry, Levinsohn, and Pakes (1995) and find the CSA-2SLS estimate is better supported by economic theory than the alternative estimates.