Unraveling the Dynamic Importance of County-level Features in Trajectory of COVID-19


الملخص بالإنكليزية

The objective of this study was to investigate the importance of multiple county-level features in the trajectory of COVID-19. We examined feature importance across 2,787 counties in the United States using a data-driven machine learning model. We trained random forest models using 23 features representing six key influencing factors affecting pandemic spread: social demographics of counties, population activities, mobility within the counties, movement across counties, disease attributes, and social network structure. Also, we categorized counties into multiple groups according to their population densities, and we divided the trajectory of COVID-19 into three stages: the outbreak stage, the social distancing stage, and the reopening stage. The study aims to answer two research questions: (1) The extent to which the importance of heterogeneous features evolves in different stages; (2) The extent to which the importance of heterogeneous features varies across counties with different characteristics. We fitted a set of random forest models to determine weekly feature importance. The results showed that: (1) Social demographic features, such as gross domestic product, population density, and minority status maintained high-importance features throughout stages of COVID-19 across the 2787 studied counties; (2) Within-county mobility features had the highest importance in county clusters with higher population densities; (3) The feature reflecting the social network structure (Facebook, social connectedness index), had higher importance in the models for counties with higher population densities. The results show that the data-driven machine learning models could provide important insights to inform policymakers regarding feature importance for counties with various population densities and in different stages of a pandemic life cycle.

تحميل البحث