No Arabic abstract
We consider sparse estimation of a class of high-dimensional spatio-temporal models. Unlike classical spatial autoregressive models, we do not rely on a predetermined spatial interaction matrix. Instead, under the assumption of sparsity, we estimate the relationships governing both the spatial and temporal dependence in a fully data-driven way by penalizing a set of Yule-Walker equations. While this regularization can be left unstructured, we also propose a customized form of shrinkage to further exploit diagonally structured forms of sparsity that follow intuitively when observations originate from spatial grids such as satellite images. We derive finite sample error bounds for this estimator, as well estimation consistency in an asymptotic framework wherein the sample size and the number of spatial units diverge jointly. A simulation exercise shows strong finite sample performance compared to competing procedures. As an empirical application, we model satellite measured NO2 concentrations in London. Our approach delivers forecast improvements over a competitive benchmark and we discover evidence for strong spatial interactions between sub-regions.
This paper develops the asymptotic theory of a Fully Modified Generalized Least Squares estimator for multivariate cointegrating polynomial regressions. Such regressions allow for deterministic trends, stochastic trends and integer powers of stochastic trends to enter the cointegrating relations. Our fully modified estimator incorporates: (1) the direct estimation of the inverse autocovariance matrix of the multidimensional errors, and (2) second order bias corrections. The resulting estimator has the intuitive interpretation of applying a weighted least squares objective function to filtered data series. Moreover, the required second order bias corrections are convenient byproducts of our approach and lead to standard asymptotic inference. We also study several multivariate KPSS-type of tests for the null of cointegration. A comprehensive simulation study shows good performance of the FM-GLS estimator and the related tests. As a practical illustration, we reinvestigate the Environmental Kuznets Curve (EKC) hypothesis for six early industrialized countries as in Wagner et al. (2020).
This paper introduces structured machine learning regressions for prediction and nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the empirical problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and we find that it empirically outperforms the unstructured machine learning methods. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data exhibit heavier than Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed $tau$-mixing processes which may be of independent interest in other high-dimensional panel data settings.
In epidemiological disease mapping one aims to estimate the spatio-temporal pattern in disease risk and identify high-risk clusters, allowing health interventions to be appropriately targeted. Bayesian spatio-temporal models are used to estimate smoothed risk surfaces, but this is contrary to the aim of identifying groups of areal units that exhibit elevated risks compared with their neighbours. Therefore, in this paper we propose a new Bayesian hierarchical modelling approach for simultaneously estimating disease risk and identifying high-risk clusters in space and time. Inference for this model is based on Markov chain Monte Carlo simulation, using the freely available R package CARBayesST that has been developed in conjunction with this paper. Our methodology is motivated by two case studies, the first of which assesses if there is a relationship between Public health Districts and colon cancer clusters in Georgia, while the second looks at the impact of the smoking ban in public places in England on cardiovascular disease clusters.
With the aim of considering models with persistent memory we propose a fractional nonlinear modification of the classical Yule model often studied in the context of macrovolution. Here the model is analyzed and interpreted in the framework of the development of networks such as the World Wide Web. Nonlinearity is introduced by replacing the linear birth process governing the growth of the in-links of each specific webpage with a fractional nonlinear birth process with completely general birth rates. Among the main results we derive the explicit distribution of the number of in-links of a webpage chosen uniformly at random recognizing the contribution to the asymptotics and the finite time correction. The mean value of the latter distribution is also calculated explicitly in the most general case. Furthermore, in order to show the usefulness of our results, we particularize them in the case of specific birth rates giving rise to a saturating behaviour, a property that is often observed in nature. The further specialization to the non-fractional case allows us to extend the Yule model accounting for a nonlinear growth.
Dynamic model averaging (DMA) combines the forecasts of a large number of dynamic linear models (DLMs) to predict the future value of a time series. The performance of DMA critically depends on the appropriate choice of two forgetting factors. The first of these controls the speed of adaptation of the coefficient vector of each DLM, while the second enables time variation in the model averaging stage. In this paper we develop a novel, adaptive dynamic model averaging (ADMA) methodology. The proposed methodology employs a stochastic optimisation algorithm that sequentially updates the forgetting factor of each DLM, and uses a state-of-the-art non-parametric model combination algorithm from the prediction with expert advice literature, which offers finite-time performance guarantees. An empirical application to quarterly UK house price data suggests that ADMA produces more accurate forecasts than the benchmark autoregressive model, as well as competing DMA specifications.