Combining social media and survey data to nowcast migrant stocks in the United States

98 0 0.0 ( 0 )

Download Cite

Added by Monica Alexander

Publication date 2020

fields Mathematical Statistics

and research's language is English

Authors Monica Alexander - Kivan Polimis - Emilio Zagheni

Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Measuring and forecasting migration patterns, and how they change over time, has important implications for understanding broader population trends, for designing policy effectively and for allocating resources. However, data on migration and mobility are often lacking, and those that do exist are not available in a timely manner. Social media data offer new opportunities to provide more up-to-date demographic estimates and to complement more traditional data sources. Facebook, for example, can be thought of as a large digital census that is regularly updated. However, its users are not representative of the underlying population. This paper proposes a statistical framework to combine social media data with traditional survey data to produce timely `nowcasts of migrant stocks by state in the United States. The model incorporates bias adjustment of the Facebook data, and a pooled principal component time series approach, to account for correlations across age, time and space. We illustrate the results for migrants from Mexico, India and Germany, and show that the model outperforms alternatives that rely solely on either social media or survey data.

rate research

Learning to Address Health Inequality in the United States with a Bayesian Decision Network

66 - Tavpritesh Sethi , Anant Mittal , Shubham Maheshwari 2018

Life-expectancy is a complex outcome driven by genetic, socio-demographic, environmental and geographic factors. Increasing socio-economic and health disparities in the United States are propagating the longevity-gap, making it a cause for concern. Earlier studies have probed individual factors but an integrated picture to reveal quantifiable actions has been missing. There is a growing concern about a further widening of healthcare inequality caused by Artificial Intelligence (AI) due to differential access to AI-driven services. Hence, it is imperative to explore and exploit the potential of AI for illuminating biases and enabling transparent policy decisions for positive social and health impact. In this work, we reveal actionable interventions for decreasing the longevity-gap in the United States by analyzing a County-level data resource containing healthcare, socio-economic, behavioral, education and demographic features. We learn an ensemble-averaged structure, draw inferences using the joint probability distribution and extend it to a Bayesian Decision Network for identifying policy actions. We draw quantitative estimates for the impact of diversity, preventive-care quality and stable-families within the unified framework of our decision network. Finally, we make this analysis and dashboard available as an interactive web-application for enabling users and policy-makers to validate our reported findings and to explore the impact of ones beyond reported in this work.

Applications Machine Learning Machine Learning

The effect of stay-at-home orders on COVID-19 cases and fatalities in the United States

103 - James H. Fowler , Seth J. Hill , Remy Levin 2020

Governments issue stay at home orders to reduce the spread of contagious diseases, but the magnitude of such orders effectiveness is uncertain. In the United States these orders were not coordinated at the national level during the coronavirus disease 2019 (COVID-19) pandemic, which creates an opportunity to use spatial and temporal variation to measure the policies effect with greater accuracy. Here, we combine data on the timing of stay-at-home orders with daily confirmed COVID-19 cases and fatalities at the county level in the United States. We estimate the effect of stay-at-home orders using a difference-in-differences design that accounts for unmeasured local variation in factors like health systems and demographics and for unmeasured temporal variation in factors like national mitigation actions and access to tests. Compared to counties that did not implement stay-at-home orders, the results show that the orders are associated with a 30.2 percent (11.0 to 45.2) reduction in weekly cases after one week, a 40.0 percent (23.4 to 53.0) reduction after two weeks, and a 48.6 percent (31.1 to 61.7) reduction after three weeks. Stay-at-home orders are also associated with a 59.8 percent (18.3 to 80.2) reduction in weekly fatalities after three weeks. These results suggest that stay-at-home orders reduced confirmed cases by 390,000 (170,000 to 680,000) and fatalities by 41,000 (27,000 to 59,000) within the first three weeks in localities where they were implemented.

Applications General Economics Economics

Spatiotemporal effects of the causal factors on COVID-19 incidences in the contiguous United States

83 - Arabinda Maiti , Qi Zhang , Srikanta Sannigrahi 2020

Since December 2019, the world has been witnessing the gigantic effect of an unprecedented global pandemic called Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) - COVID-19. So far, 38,619,674 confirmed cases and 1,093,522 confirmed deaths due to COVID-19 have been reported. In the United States (US), the cases and deaths are recorded as 7,833,851 and 215,199. Several timely researches have discussed the local and global effects of the confounding factors on COVID-19 casualties in the US. However, most of these studies considered little about the time varying associations between and among these factors, which are crucial for understanding the outbreak of the present pandemic. Therefore, this study adopts various relevant approaches, including local and global spatial regression models and machine learning to explore the causal effects of the confounding factors on COVID-19 counts in the contiguous US. Totally five spatial regression models, spatial lag model (SLM), ordinary least square (OLS), spatial error model (SEM), geographically weighted regression (GWR) and multiscale geographically weighted regression (MGWR), are performed at the county scale to take into account the scale effects on modelling. For COVID-19 cases, ethnicity, crime, and income factors are found to be the strongest covariates and explain the maximum model variances. For COVID-19 deaths, both (domestic and international) migration and income factors play a crucial role in explaining spatial differences of COVID-19 death counts across counties. The local coefficient of determination (R2) values derived from the GWR and MGWR models are found very high over the Wisconsin-Indiana-Michigan (the Great Lake) region, as well as several parts of Texas, California, Mississippi and Arkansas.

Applications

A probabilistic gridded product for daily precipitation extremes over the United States

98 - Mark D. Risser , Christopher J. Paciorek , Michael F. Wehner andn Travis A. OBrien 2018

Gridded data products, for example interpolated daily measurements of precipitation from weather stations, are commonly used as a convenient substitute for direct observations because these products provide a spatially and temporally continuous and complete source of data. However, when the goal is to characterize climatological features of extreme precipitation over a spatial domain (e.g., a map of return values) at the native spatial scales of these phenomena, then gridded products may lead to incorrect conclusions because daily precipitation is a fractal field and hence any smoothing technique will dampen local extremes. To address this issue, we create a new probabilistic gridded product specifically designed to characterize the climatological properties of extreme precipitation by applying spatial statistical analyses to daily measurements of precipitation from the GHCN over CONUS. The essence of our method is to first estimate the climatology of extreme precipitation based on station data and then use a data-driven statistical approach to interpolate these estimates to a fine grid. We argue that our method yields an improved characterization of the climatology within a grid cell because the probabilistic behavior of extreme precipitation is much better behaved (i.e., smoother) than daily weather. Furthermore, the spatial smoothing innate to our approach significantly increases the signal-to-noise ratio in the estimated extreme statistics relative to an analysis without smoothing. Finally, by deriving a data-driven approach for translating extreme statistics to a spatially complete grid, the methodology outlined in this paper resolves the issue of how to properly compare station data with output from earth system models. We conclude the paper by comparing our probabilistic gridded product with a standard extreme value analysis of the Livneh gridded daily precipitation product.

Applications Atmospheric and Oceanic Physics

A database of travel-related behaviors and attitudes before, during, and after COVID-19 in the United States

171 - Rishabh Singh Chauhan , Matthew Wigginton Conway , Denise Capasso dan Silva 2021

The COVID-19 pandemic has impacted billions of people around the world. To capture some of these impacts in the United States, we are conducting a nationwide longitudinal survey collecting information about travel-related behaviors and attitudes before, during, and after the COVID-19 pandemic. The survey questions cover a wide range of topics including commuting, daily travel, air travel, working from home, online learning, shopping, and risk perception, along with attitudinal, socioeconomic, and demographic information. Version 1.0 of the survey contains 8,723 responses that are publicly available. The survey is deployed over multiple waves to the same respondents to monitor how behaviors and attitudes evolve over time. This article details the methodology adopted for the collection, cleaning, and processing of the data. In addition, the data are weighted to be representative of national and regional demographics. This survey dataset can aid researchers, policymakers, businesses, and government agencies in understanding both the extent of behavioral shifts and the likelihood that these changes will persist after COVID-19.

Applications