Do you want to publish a course? Click here

Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios

90   0   0.0 ( 0 )
 Added by Andrii Babii
 Publication date 2020
  fields Economy
and research's language is English




Ask ChatGPT about the research

This paper introduces structured machine learning regressions for prediction and nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the empirical problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and we find that it empirically outperforms the unstructured machine learning methods. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data exhibit heavier than Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed $tau$-mixing processes which may be of independent interest in other high-dimensional panel data settings.



rate research

Read More

We propose a generalization of the linear panel quantile regression model to accommodate both textit{sparse} and textit{dense} parts: sparse means while the number of covariates available is large, potentially only a much smaller number of them have a nonzero impact on each conditional quantile of the response variable; while the dense part is represent by a low-rank matrix that can be approximated by latent factors and their loadings. Such a structure poses problems for traditional sparse estimators, such as the $ell_1$-penalised Quantile Regression, and for traditional latent factor estimator, such as PCA. We propose a new estimation procedure, based on the ADMM algorithm, consists of combining the quantile loss function with $ell_1$ textit{and} nuclear norm regularization. We show, under general conditions, that our estimator can consistently estimate both the nonzero coefficients of the covariates and the latent low-rank matrix. Our proposed model has a Characteristics + Latent Factors Asset Pricing Model interpretation: we apply our model and estimator with a large-dimensional panel of financial data and find that (i) characteristics have sparser predictive power once latent factors were controlled (ii) the factors and coefficients at upper and lower quantiles are different from the median.
Dynamic model averaging (DMA) combines the forecasts of a large number of dynamic linear models (DLMs) to predict the future value of a time series. The performance of DMA critically depends on the appropriate choice of two forgetting factors. The first of these controls the speed of adaptation of the coefficient vector of each DLM, while the second enables time variation in the model averaging stage. In this paper we develop a novel, adaptive dynamic model averaging (ADMA) methodology. The proposed methodology employs a stochastic optimisation algorithm that sequentially updates the forgetting factor of each DLM, and uses a state-of-the-art non-parametric model combination algorithm from the prediction with expert advice literature, which offers finite-time performance guarantees. An empirical application to quarterly UK house price data suggests that ADMA produces more accurate forecasts than the benchmark autoregressive model, as well as competing DMA specifications.
The Environment Kuznets Curve (EKC) predicts an inverted U-shaped relationship between economic growth and environmental pollution. Current analyses frequently employ models which restrict the nonlinearities in the data to be explained by the economic growth variable only. We propose a Generalized Cointegrating Polynomial Regression (GCPR) with flexible time trends to proxy time effects such as technological progress and/or environmental awareness. More specifically, a GCPR includes flexible powers of deterministic trends and integer powers of stochastic trends. We estimate the GCPR by nonlinear least squares and derive its asymptotic distribution. Endogeneity of the regressors can introduce nuisance parameters into this limiting distribution but a simulated approach nevertheless enables us to conduct valid inference. Moreover, a subsampling KPSS test can be used to check the stationarity of the errors. A comprehensive simulation study shows good performance of the simulated inference approach and the subsampling KPSS test. We illustrate the GCPR approach on a dataset of 18 industrialised countries containing GDP and CO2 emissions. We conclude that: (1) the evidence for an EKC is significantly reduced when a nonlinear time trend is included, and (2) a linear cointegrating relation between GDP and CO2 around a power law trend also provides an accurate description of the data.
117 - Jiangtao Duan , Wei Gao , Hao Qu 2019
In this paper, a statistical model for panel data with unobservable grouped factor structures which are correlated with the regressors and the group membership can be unknown. The factor loadings are assumed to be in different subspaces and the subspace clustering for factor loadings are considered. A method called least squares subspace clustering estimate (LSSC) is proposed to estimate the model parameters by minimizing the least-square criterion and to perform the subspace clustering simultaneously. The consistency of the proposed subspace clustering is proved and the asymptotic properties of the estimation procedure are studied under certain conditions. A Monte Carlo simulation study is used to illustrate the advantages of the proposed method. Further considerations for the situations that the number of subspaces for factors, the dimension of factors and the dimension of subspaces are unknown are also discussed. For illustrative purposes, the proposed method is applied to study the linkage between income and democracy across countries while subspace patterns of unobserved factors and factor loadings are allowed.
245 - Weiping Ma , Yang Feng , Kani Chen 2013
Motivated by modeling and analysis of mass-spectrometry data, a semi- and nonparametric model is proposed that consists of a linear parametric component for individual location and scale and a nonparametric regression function for the common shape. A multi-step approach is developed that simultaneously estimates the parametric components and the nonparametric function. Under certain regularity conditions, it is shown that the resulting estimators is consistent and asymptotic normal for the parametric part and achieve the optimal rate of convergence for the nonparametric part when the bandwidth is suitably chosen. Simulation results are presented to demonstrate the effectiveness and finite-sample performance of the method. The method is also applied to a SELDI-TOF mass spectrometry data set from a study of liver cancer patients.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا