No Arabic abstract
For epidemics control and prevention, timely insights of potential hot spots are invaluable. Alternative to traditional epidemic surveillance, which often lags behind real time by weeks, big data from the Internet provide important information of the current epidemic trends. Here we present a methodology, ARGOX (Augmented Regression with GOogle data CROSS space), for accurate real-time tracking of state-level influenza epidemics in the United States. ARGOX combines Internet search data at the national, regional and state levels with traditional influenza surveillance data from the Centers for Disease Control and Prevention, and accounts for both the spatial correlation structure of state-level influenza activities and the evolution of peoples Internet search pattern. ARGOX achieves on average 28% error reduction over the best alternative for real-time state-level influenza estimation for 2014 to 2020. ARGOX is robust and reliable and can be potentially applied to track county- and city-level influenza activity and other infectious diseases.
Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-search-based tracking models, including the latest version of Google Flu Trends, even though it uses only low-quality search data as input from publicly available Google Trends and Google Correlate websites. ARGO not only incorporates the seasonality in influenza epidemics but also captures changes in peoples online search behavior over time. ARGO is also flexible, self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for real-time tracking of other social events at multiple temporal and spatial resolutions.
Big data generated from the Internet offer great potential for predictive analysis. Here we focus on using online users Internet search data to forecast unemployment initial claims weeks into the future, which provides timely insights into the direction of the economy. To this end, we present a novel method PRISM (Penalized Regression with Inferred Seasonality Module), which uses publicly available online search data from Google. PRISM is a semi-parametric method, motivated by a general state-space formulation, and employs nonparametric seasonal decomposition and penalized regression. For forecasting unemployment initial claims, PRISM outperforms all previously available methods, including forecasting during the 2008-2009 financial crisis period and near-future forecasting during the COVID-19 pandemic period, when unemployment initial claims both rose rapidly. The timely and accurate unemployment forecasts by PRISM could aid government agencies and financial institutions to assess the economic trend and make well-informed decisions, especially in the face of economic turbulence.
Influenza remains a significant burden on health systems. Effective responses rely on the timely understanding of the magnitude and the evolution of an outbreak. For monitoring purposes, data on severe cases of influenza in England are reported weekly to Public Health England. These data are both readily available and have the potential to provide valuable information to estimate and predict the key transmission features of seasonal and pandemic influenza. We propose an epidemic model that links the underlying unobserved influenza transmission process to data on severe influenza cases. Within a Bayesian framework, we infer retrospectively the parameters of the epidemic model for each seasonal outbreak from 2012 to 2015, including: the effective reproduction number; the initial susceptibility; the probability of admission to intensive care given infection; and the effect of school closure on transmission. The model is also implemented in real time to assess whether early forecasting of the number of admission to intensive care is possible. Our model of admissions data allows reconstruction of the underlying transmission dynamics revealing: increased transmission during the season 2013/14 and a noticeable effect of Christmas school holiday on disease spread during season 2012/13 and 2014/15. When information on the initial immunity of the population is available, forecasts of the number of admissions to intensive care can be substantially improved. Readily available severe case data can be effectively used to estimate epidemiological characteristics and to predict the evolution of an epidemic, crucially allowing real-time monitoring of the transmission and severity of the outbreak.
Based on measurements of the Internet topology data, we found out that there are two mechanisms which are necessary for the correct modeling of the Internet topology at the Autonomous Systems (AS) level: the Interactive Growth of new nodes and new internal links, and a nonlinear preferential attachment, where the preference probability is described by a positive-feedback mechanism. Based on the above mechanisms, we introduce the Positive-Feedback Preference (PFP) model which accurately reproduces many topological properties of the AS-level Internet, including: degree distribution, rich-club connectivity, the maximum degree, shortest path length, short cycles, disassortative mixing and betweenness centrality. The PFP model is a phenomenological model which provides a novel insight into the evolutionary dynamics of real complex networks.
Nitrogen dioxide (NO$_2$) is a primary constituent of traffic-related air pollution and has well established harmful environmental and human-health impacts. Knowledge of the spatiotemporal distribution of NO$_2$ is critical for exposure and risk assessment. A common approach for assessing air pollution exposure is linear regression involving spatially referenced covariates, known as land-use regression (LUR). We develop a scalable approach for simultaneous variable selection and estimation of LUR models with spatiotemporally correlated errors, by combining a general-Vecchia Gaussian process approximation with a penalty on the LUR coefficients. In comparisons to existing methods using simulated data, our approach resulted in higher model-selection specificity and sensitivity and in better prediction in terms of calibration and sharpness, for a wide range of relevant settings. In our spatiotemporal analysis of daily, US-wide, ground-level NO$_2$ data, our approach was more accurate, and produced a sparser and more interpretable model. Our daily predictions elucidate spatiotemporal patterns of NO$_2$ concentrations across the United States, including significant variations between cities and intra-urban variation. Thus, our predictions will be useful for epidemiological and risk-assessment studies seeking daily, national-scale predictions, and they can be used in acute-outcome health-risk assessments.