No Arabic abstract
In a low-dimensional linear regression setup, considering linear transformations/combinations of predictors does not alter predictions. However, when the forecasting technology either uses shrinkage or is nonlinear, it does. This is precisely the fabric of the machine learning (ML) macroeconomic forecasting environment. Pre-processing of the data translates to an alteration of the regularization -- explicit or implicit -- embedded in ML algorithms. We review old transformations and propose new ones, then empirically evaluate their merits in a substantial pseudo-out-sample exercise. It is found that traditional factors should almost always be included as predictors and moving average rotations of the data can provide important gains for various forecasting targets. Also, we note that while predicting directly the average growth rate is equivalent to averaging separate horizon forecasts when using OLS-based techniques, the latter can substantially improve on the former when regularization and/or nonparametric nonlinearities are involved.
We move beyond Is Machine Learning Useful for Macroeconomic Forecasting? by adding the how. The current forecasting literature has focused on matching specific variables and horizons with a particularly successful algorithm. In contrast, we study the usefulness of the underlying features driving ML gains over standard macroeconometric methods. We distinguish four so-called features (nonlinearities, regularization, cross-validation and alternative loss function) and study their behavior in both the data-rich and data-poor environments. To do so, we design experiments that allow to identify the treatment effects of interest. We conclude that (i) nonlinearity is the true game changer for macroeconomic prediction, (ii) the standard factor model remains the best regularization, (iii) K-fold cross-validation is the best practice and (iv) the $L_2$ is preferred to the $bar epsilon$-insensitive in-sample loss. The forecasting gains of nonlinear techniques are associated with high macroeconomic uncertainty, financial stress and housing bubble bursts. This suggests that Machine Learning is useful for macroeconomic forecasting by mostly capturing important nonlinearities that arise in the context of uncertainty and financial frictions.
We present a robust generalization of the synthetic control method for comparative case studies. Like the classical method, we present an algorithm to estimate the unobservable counterfactual of a treatment unit. A distinguishing feature of our algorithm is that of de-noising the data matrix via singular value thresholding, which renders our approach robust in multiple facets: it automatically identifies a good subset of donors, overcomes the challenges of missing data, and continues to work well in settings where covariate information may not be provided. To begin, we establish the condition under which the fundamental assumption in synthetic control-like approaches holds, i.e. when the linear relationship between the treatment unit and the donor pool prevails in both the pre- and post-intervention periods. We provide the first finite sample analysis for a broader class of models, the Latent Variable Model, in contrast to Factor Models previously considered in the literature. Further, we show that our de-noising procedure accurately imputes missing entries, producing a consistent estimator of the underlying signal matrix provided $p = Omega( T^{-1 + zeta})$ for some $zeta > 0$; here, $p$ is the fraction of observed data and $T$ is the time interval of interest. Under the same setting, we prove that the mean-squared-error (MSE) in our prediction estimation scales as $O(sigma^2/p + 1/sqrt{T})$, where $sigma^2$ is the noise variance. Using a data aggregation method, we show that the MSE can be made as small as $O(T^{-1/2+gamma})$ for any $gamma in (0, 1/2)$, leading to a consistent estimator. We also introduce a Bayesian framework to quantify the model uncertainty through posterior probabilities. Our experiments, using both real-world and synthetic datasets, demonstrate that our robust generalization yields an improvement over the classical synthetic control method.
Based on evidence gathered from a newly built large macroeconomic data set for the UK, labeled UK-MD and comparable to similar datasets for the US and Canada, it seems the most promising avenue for forecasting during the pandemic is to allow for general forms of nonlinearity by using machine learning (ML) methods. But not all nonlinear ML methods are alike. For instance, some do not allow to extrapolate (like regular trees and forests) and some do (when complemented with linear dynamic components). This and other crucial aspects of ML-based forecasting in unprecedented times are studied in an extensive pseudo-out-of-sample exercise.
Within the national innovation system literature, empirical analyses are severely lacking for developing economies. Particularly, the low- and middle-income countries (LMICs) eligible for the World Banks International Development Association (IDA) support, are rarely part of any empirical discourse on growth, development, and innovation. One major issue hindering panel analyses in LMICs, and thus them being subject to any empirical discussion, is the lack of complete data availability. This work offers a new complete panel dataset with no missing values for LMICs eligible for IDAs support. I use a standard, widely respected multiple imputation technique (specifically, Predictive Mean Matching) developed by Rubin (1987). This technique respects the structure of multivariate continuous panel data at the country level. I employ this technique to create a large dataset consisting of many variables drawn from publicly available established sources. These variables, in turn, capture six crucial country-level capacities: technological capacity, financial capacity, human capital capacity, infrastructural capacity, public policy capacity, and social capacity. Such capacities are part and parcel of the National Absorptive Capacity Systems (NACS). The dataset (MSK dataset) thus produced contains data on 47 variables for 82 LMICs between 2005 and 2019. The dataset has passed a quality and reliability check and can thus be used for comparative analyses of national absorptive capacities and development, transition, and convergence analyses among LMICs.
This paper provides a thorough analysis on the dynamic structures and predictability of Chinas Consumer Price Index (CPI-CN), with a comparison to those of the United States. Despite the differences in the two leading economies, both series can be well modeled by a class of Seasonal Autoregressive Integrated Moving Average Model with Covariates (S-ARIMAX). The CPI-CN series possess regular patterns of dynamics with stable annual cycles and strong Spring Festival effects, with fitting and forecasting errors largely comparable to their US counterparts. Finally, for the CPI-CN, the diffusion index (DI) approach offers improved predictions than the S-ARIMAX models.