No Arabic abstract
Dynamic model averaging (DMA) combines the forecasts of a large number of dynamic linear models (DLMs) to predict the future value of a time series. The performance of DMA critically depends on the appropriate choice of two forgetting factors. The first of these controls the speed of adaptation of the coefficient vector of each DLM, while the second enables time variation in the model averaging stage. In this paper we develop a novel, adaptive dynamic model averaging (ADMA) methodology. The proposed methodology employs a stochastic optimisation algorithm that sequentially updates the forgetting factor of each DLM, and uses a state-of-the-art non-parametric model combination algorithm from the prediction with expert advice literature, which offers finite-time performance guarantees. An empirical application to quarterly UK house price data suggests that ADMA produces more accurate forecasts than the benchmark autoregressive model, as well as competing DMA specifications.
We develop a distribution regression model under endogenous sample selection. This model is a semiparametric generalization of the Heckman selection model that accommodates much richer patterns of heterogeneity in the selection process and effect of the covariates. The model applies to continuous, discrete and mixed outcomes. We study the identification of the model, and develop a computationally attractive two-step method to estimate the model parameters, where the first step is a probit regression for the selection equation and the second step consists of multiple distribution regressions with selection corrections for the outcome equation. We construct estimators of functionals of interest such as actual and counterfactual distributions of latent and observed outcomes via plug-in rule. We derive functional central limit theorems for all the estimators and show the validity of multiplier bootstrap to carry out functional inference. We apply the methods to wage decompositions in the UK using new data. Here we decompose the difference between the male and female wage distributions into four effects: composition, wage structure, selection structure and selection sorting. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21% to 40% throughout the (latent) offered wage distribution that is not explained by observable labor market characteristics. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution. These findings can be interpreted as evidence of assortative matching in the marriage market and glass-ceiling in the labor market.
We propose a two-stage least squares (2SLS) estimator whose first stage is the equal-weighted average over a complete subset with $k$ instruments among $K$ available, which we call the complete subset averaging (CSA) 2SLS. The approximate mean squared error (MSE) is derived as a function of the subset size $k$ by the Nagar (1959) expansion. The subset size is chosen by minimizing the sample counterpart of the approximate MSE. We show that this method achieves the asymptotic optimality among the class of estimators with different subset sizes. To deal with averaging over a growing set of irrelevant instruments, we generalize the approximate MSE to find that the optimal $k$ is larger than otherwise. An extensive simulation experiment shows that the CSA-2SLS estimator outperforms the alternative estimators when instruments are correlated. As an empirical illustration, we estimate the logistic demand function in Berry, Levinsohn, and Pakes (1995) and find the CSA-2SLS estimate is better supported by economic theory than the alternative estimates.
This paper introduces structured machine learning regressions for prediction and nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the empirical problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and we find that it empirically outperforms the unstructured machine learning methods. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data exhibit heavier than Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed $tau$-mixing processes which may be of independent interest in other high-dimensional panel data settings.
We develop a novel method of constructing confidence bands for nonparametric regression functions under shape constraints. This method can be implemented via a linear programming, and it is thus computationally appealing. We illustrate a usage of our proposed method with an application to the regression kink design (RKD). Econometric analyses based on the RKD often suffer from wide confidence intervals due to slow convergence rates of nonparametric derivative estimators. We demonstrate that economic models and structures motivate shape restrictions, which in turn contribute to shrinking the confidence interval for an analysis of the causal effects of unemployment insurance benefits on unemployment durations.
This paper develops the asymptotic theory of a Fully Modified Generalized Least Squares estimator for multivariate cointegrating polynomial regressions. Such regressions allow for deterministic trends, stochastic trends and integer powers of stochastic trends to enter the cointegrating relations. Our fully modified estimator incorporates: (1) the direct estimation of the inverse autocovariance matrix of the multidimensional errors, and (2) second order bias corrections. The resulting estimator has the intuitive interpretation of applying a weighted least squares objective function to filtered data series. Moreover, the required second order bias corrections are convenient byproducts of our approach and lead to standard asymptotic inference. We also study several multivariate KPSS-type of tests for the null of cointegration. A comprehensive simulation study shows good performance of the FM-GLS estimator and the related tests. As a practical illustration, we reinvestigate the Environmental Kuznets Curve (EKC) hypothesis for six early industrialized countries as in Wagner et al. (2020).