No Arabic abstract
Forecasting accuracy of mortality data is important for the management of pension funds and pricing of life insurance in actuarial science. Age-specific mortality forecasting in the US poses a challenging problem in high dimensional time series analysis. Prior attempts utilize traditional dimension reduction techniques to avoid the curse of dimensionality, and then mortality forecasting is achieved through features forecasting. However, a method of reducing dimension pertinent to ideal forecasting is elusive. To address this, we propose a novel approach to pursue features that are not only capable of representing original data well but also capturing time-serial dependence as most as possible. The proposed method is adaptive for the US mortality data and enjoys good statistical performance. As a comparison, our method performs better than existing approaches, especially in regard to the Lee-Carter Model as a benchmark in mortality analysis. Based on forecasting results, we generate more accurate estimates of future life expectancies and prices of life annuities, which can have great financial impact on life insurers and social securities compared with using Lee-Carter Model. Furthermore, various simulations illustrate scenarios under which our method has advantages, as well as interpretation of the good performance on mortality data.
High-dimensional classification has become an increasingly important problem. In this paper we propose a Multivariate Adaptive Stochastic Search (MASS) approach which first reduces the dimension of the data space and then applies a standard classification method to the reduced space. One key advantage of MASS is that it automatically adjusts to mimic variable selection type methods, such as the Lasso, variable combination methods, such as PCA, or methods that combine these two approaches. The adaptivity of MASS allows it to perform well in situations where pure variable selection or variable combination methods fail. Another major advantage of our approach is that MASS can accurately project the data into very low-dimensional non-linear, as well as linear, spaces. MASS uses a stochastic search algorithm to select a handful of optimal projection directions from a large number of random directions in each iteration. We provide some theoretical justification for MASS and demonstrate its strengths on an extensive range of simulation studies and real world data sets by comparing it to many classical and modern classification methods.
Big data generated from the Internet offer great potential for predictive analysis. Here we focus on using online users Internet search data to forecast unemployment initial claims weeks into the future, which provides timely insights into the direction of the economy. To this end, we present a novel method PRISM (Penalized Regression with Inferred Seasonality Module), which uses publicly available online search data from Google. PRISM is a semi-parametric method, motivated by a general state-space formulation, and employs nonparametric seasonal decomposition and penalized regression. For forecasting unemployment initial claims, PRISM outperforms all previously available methods, including forecasting during the 2008-2009 financial crisis period and near-future forecasting during the COVID-19 pandemic period, when unemployment initial claims both rose rapidly. The timely and accurate unemployment forecasts by PRISM could aid government agencies and financial institutions to assess the economic trend and make well-informed decisions, especially in the face of economic turbulence.
Reliable mortality estimates at the subnational level are essential in the study of health inequalities within a country. One of the difficulties in producing such estimates is the presence of small populations, where the stochastic variation in death counts is relatively high, and so the underlying mortality levels are unclear. We present a Bayesian hierarchical model to estimate mortality at the subnational level. The model builds on characteristic age patterns in mortality curves, which are constructed using principal components from a set of reference mortality curves. Information on mortality rates are pooled across geographic space and smoothed over time. Testing of the model shows reasonable estimates and uncertainty levels when the model is applied to both simulated data which mimic US counties, and real data for French departments. The estimates produced by the model have direct applications to the study of subregional health patterns and disparities.
The need to forecast COVID-19 related variables continues to be pressing as the epidemic unfolds. Different efforts have been made, with compartmental models in epidemiology and statistical models such as AutoRegressive Integrated Moving Average (ARIMA), Exponential Smoothing (ETS) or computing intelligence models. These efforts have proved useful in some instances by allowing decision makers to distinguish different scenarios during the emergency, but their accuracy has been disappointing, forecasts ignore uncertainties and less attention is given to local areas. In this study, we propose a simple Multiple Linear Regression model, optimised to use call data to forecast the number of daily confirmed cases. Moreover, we produce a probabilistic forecast that allows decision makers to better deal with risk. Our proposed approach outperforms ARIMA, ETS and a regression model without call data, evaluated by three point forecast error metrics, one prediction interval and two probabilistic forecast accuracy measures. The simplicity, interpretability and reliability of the model, obtained in a careful forecasting exercise, is a meaningful contribution to decision makers at local level who acutely need to organise resources in already strained health services. We hope that this model would serve as a building block of other forecasting efforts that on the one hand would help front-line personal and decision makers at local level, and on the other would facilitate the communication with other modelling efforts being made at the national level to improve the way we tackle this pandemic and other similar future challenges.
We develop a novel hybrid epidemiological model and a specific methodology for its calibration to distinguish and assess the impact of mobility restrictions (given by Apples mobility trends data) from other complementary non-pharmaceutical interventions (NPIs) used to control the spread of COVID-19. Using the calibrated model, we estimate that mobility restrictions contribute to 47 % (US States) and 47 % (worldwide) of the overall suppression of the disease transmission rate using data up to 13/08/2020. The forecast capacity of our model was evaluated doing four-weeks ahead predictions. Using data up to 30/06/20 for calibration, the mean absolute percentage error (MAPE) of the prediction of cumulative deceased individuals was 5.0 % for the United States (51 states) and 6.7 % worldwide (49 countries). This MAPE was reduced to 3.5% for the US and 3.8% worldwide using data up to 13/08/2020. We find that the MAPE was higher for the total confirmed cases at 11.5% worldwide and 10.2% for the US States using data up to 13/08/2020. Our calibrated model achieves an average R-Squared value for cumulative confirmed and deceased cases of 0.992 using data up to 30/06/20 and 0.98 using data up to 13/08/20.