No Arabic abstract
Structural estimation is an important methodology in empirical economics, and a large class of structural models are estimated through the generalized method of moments (GMM). Traditionally, selection of structural models has been performed based on model fit upon estimation, which take the entire observed samples. In this paper, we propose a model selection procedure based on cross-validation (CV), which utilizes sample-splitting technique to avoid issues such as over-fitting. While CV is widely used in machine learning communities, we are the first to prove its consistency in model selection in GMM framework. Its empirical property is compared to existing methods by simulations of IV regressions and oligopoly market model. In addition, we propose the way to apply our method to Mathematical Programming of Equilibrium Constraint (MPEC) approach. Finally, we perform our method to online-retail sales data to compare dynamic market model to static model.
We consider identification and estimation of nonseparable sample selection models with censored selection rules. We employ a control function approach and discuss different objects of interest based on (1) local effects conditional on the control function, and (2) global effects obtained from integration over ranges of values of the control function. We derive the conditions for the identification of these different objects and suggest strategies for estimation. Moreover, we provide the associated asymptotic theory. These strategies are illustrated in an empirical investigation of the determinants of female wages in the United Kingdom.
Accurate estimation for extent of cross{sectional dependence in large panel data analysis is paramount to further statistical analysis on the data under study. Grouping more data with weak relations (cross{sectional dependence) together often results in less efficient dimension reduction and worse forecasting. This paper describes cross-sectional dependence among a large number of objects (time series) via a factor model and parameterizes its extent in terms of strength of factor loadings. A new joint estimation method, benefiting from unique feature of dimension reduction for high dimensional time series, is proposed for the parameter representing the extent and some other parameters involved in the estimation procedure. Moreover, a joint asymptotic distribution for a pair of estimators is established. Simulations illustrate the effectiveness of the proposed estimation method in the finite sample performance. Applications in cross-country macro-variables and stock returns from S&P 500 are studied.
This paper proposes a criterion for simultaneous GMM model and moment selection: the generalized focused information criterion (GFIC). Rather than attempting to identify the true specification, the GFIC chooses from a set of potentially mis-specified moment conditions and parameter restrictions to minimize the mean-squared error (MSE) of a user-specified target parameter. The intent of the GFIC is to formalize a situation common in applied practice. An applied researcher begins with a set of fairly weak baseline assumptions, assumed to be correct, and must decide whether to impose any of a number of stronger, more controversial suspect assumptions that yield parameter restrictions, additional moment conditions, or both. Provided that the baseline assumptions identify the model, we show how to construct an asymptotically unbiased estimator of the asymptotic MSE to select over these suspect assumptions: the GFIC. We go on to provide results for post-selection inference and model averaging that can be applied both to the GFIC and various alternative selection criteria. To illustrate how our criterion can be used in practice, we specialize the GFIC to the problem of selecting over exogeneity assumptions and lag lengths in a dynamic panel model, and show that it performs well in simulations. We conclude by applying the GFIC to a dynamic panel data model for the price elasticity of cigarette demand.
We develop a distribution regression model under endogenous sample selection. This model is a semiparametric generalization of the Heckman selection model that accommodates much richer patterns of heterogeneity in the selection process and effect of the covariates. The model applies to continuous, discrete and mixed outcomes. We study the identification of the model, and develop a computationally attractive two-step method to estimate the model parameters, where the first step is a probit regression for the selection equation and the second step consists of multiple distribution regressions with selection corrections for the outcome equation. We construct estimators of functionals of interest such as actual and counterfactual distributions of latent and observed outcomes via plug-in rule. We derive functional central limit theorems for all the estimators and show the validity of multiplier bootstrap to carry out functional inference. We apply the methods to wage decompositions in the UK using new data. Here we decompose the difference between the male and female wage distributions into four effects: composition, wage structure, selection structure and selection sorting. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21% to 40% throughout the (latent) offered wage distribution that is not explained by observable labor market characteristics. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution. These findings can be interpreted as evidence of assortative matching in the marriage market and glass-ceiling in the labor market.
We propose a unified frequency domain cross-validation (FDCV) method to obtain an HAC standard error. Our proposed method allows for model/tuning parameter selection across parametric and nonparametric spectral estimators simultaneously. Our candidate class consists of restricted maximum likelihood-based (REML) autoregressive spectral estimators and lag-weights estimators with the Parzen kernel. We provide a method for efficiently computing the REML estimators of the autoregressive models. In simulations, we demonstrate the reliability of our FDCV method compared with the popular HAC estimators of Andrews-Monahan and Newey-West. Supplementary material for the article is available online.