No Arabic abstract
Life-expectancy is a complex outcome driven by genetic, socio-demographic, environmental and geographic factors. Increasing socio-economic and health disparities in the United States are propagating the longevity-gap, making it a cause for concern. Earlier studies have probed individual factors but an integrated picture to reveal quantifiable actions has been missing. There is a growing concern about a further widening of healthcare inequality caused by Artificial Intelligence (AI) due to differential access to AI-driven services. Hence, it is imperative to explore and exploit the potential of AI for illuminating biases and enabling transparent policy decisions for positive social and health impact. In this work, we reveal actionable interventions for decreasing the longevity-gap in the United States by analyzing a County-level data resource containing healthcare, socio-economic, behavioral, education and demographic features. We learn an ensemble-averaged structure, draw inferences using the joint probability distribution and extend it to a Bayesian Decision Network for identifying policy actions. We draw quantitative estimates for the impact of diversity, preventive-care quality and stable-families within the unified framework of our decision network. Finally, we make this analysis and dashboard available as an interactive web-application for enabling users and policy-makers to validate our reported findings and to explore the impact of ones beyond reported in this work.
Urban scaling analysis, the study of how aggregated urban features vary with the population of an urban area, provides a promising framework for discovering commonalities across cities and uncovering dynamics shared by cities across time and space. Here, we use the urban scaling framework to study an important, but under-explored feature in this community - income inequality. We propose a new method to study the scaling of income distributions by analyzing total income scaling in population percentiles. We show that income in the least wealthy decile (10%) scales close to linearly with city population, while income in the most wealthy decile scale with a significantly superlinear exponent. In contrast to the superlinear scaling of total income with city population, this decile scaling illustrates that the benefits of larger cities are increasingly unequally distributed. For the poorest income deciles, cities have no positive effect over the null expectation of a linear increase. We repeat our analysis after adjusting income by housing cost, and find similar results. We then further analyze the shapes of income distributions. First, we find that mean, variance, skewness, and kurtosis of income distributions all increase with city size. Second, the Kullback-Leibler divergence between a citys income distribution and that of the largest city decreases with city population, suggesting the overall shape of income distribution shifts with city population. As most urban scaling theories consider densifying interactions within cities as the fundamental process leading to the superlinear increase of many features, our results suggest this effect is only seen in the upper deciles of the cities. Our finding encourages future work to consider heterogeneous models of interactions to form a more coherent understanding of urban scaling.
Obtaining the ability to make informed decisions regarding the operation and maintenance of structures, provides a major incentive for the implementation of structural health monitoring (SHM) systems. Probabilistic risk assessment (PRA) is an established methodology that allows engineers to make risk-informed decisions regarding the design and operation of safety-critical and high-value assets in industries such as nuclear and aerospace. The current paper aims to formulate a risk-based decision framework for structural health monitoring that combines elements of PRA with the existing SHM paradigm. As an apt tool for reasoning and decision-making under uncertainty, probabilistic graphical models serve as the foundation of the framework. The framework involves modelling failure modes of structures as Bayesian network representations of fault trees and then assigning costs or utilities to the failure events. The fault trees allow for information to pass from probabilistic classifiers to influence diagram representations of decision processes whilst also providing nodes within the graphical model that may be queried to obtain marginal probability distributions over local damage states within a structure. Optimal courses of action for structures are selected by determining the strategies that maximise expected utility. The risk-based framework is demonstrated on a realistic truss-like structure and supported by experimental data. Finally, a discussion of the risk-based approach is made and further challenges pertaining to decision-making processes in the context of SHM are identified.
Measuring and forecasting migration patterns, and how they change over time, has important implications for understanding broader population trends, for designing policy effectively and for allocating resources. However, data on migration and mobility are often lacking, and those that do exist are not available in a timely manner. Social media data offer new opportunities to provide more up-to-date demographic estimates and to complement more traditional data sources. Facebook, for example, can be thought of as a large digital census that is regularly updated. However, its users are not representative of the underlying population. This paper proposes a statistical framework to combine social media data with traditional survey data to produce timely `nowcasts of migrant stocks by state in the United States. The model incorporates bias adjustment of the Facebook data, and a pooled principal component time series approach, to account for correlations across age, time and space. We illustrate the results for migrants from Mexico, India and Germany, and show that the model outperforms alternatives that rely solely on either social media or survey data.
Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecification. We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $tilde{mathcal{O}}(H^2 epsilon)$ from TS with a well specified prior, where $epsilon$ is the total-variation distance between priors and $H$ is the learning horizon. Our bound does not require the prior to have any parametric form. For priors with bounded support, our bound is independent of the cardinality or structure of the action space, and we show that it is tight up to universal constants in the worst case. Building on our sensitivity analysis, we establish generic PAC guarantees for algorithms in the recently studied Bayesian meta-learning setting and derive corollaries for various families of priors. Our results generalize along two axes: (1) they apply to a broader family of Bayesian decision-making algorithms, including a Monte-Carlo implementation of the knowledge gradient algorithm (KG), and (2) they apply to Bayesian POMDPs, the most general Bayesian decision-making setting, encompassing contextual bandits as a special case. Through numerical simulations, we illustrate how prior misspecification and the deployment of one-step look-ahead (as in KG) can impact the convergence of meta-learning in multi-armed and contextual bandits with structured and correlated priors.
Gridded data products, for example interpolated daily measurements of precipitation from weather stations, are commonly used as a convenient substitute for direct observations because these products provide a spatially and temporally continuous and complete source of data. However, when the goal is to characterize climatological features of extreme precipitation over a spatial domain (e.g., a map of return values) at the native spatial scales of these phenomena, then gridded products may lead to incorrect conclusions because daily precipitation is a fractal field and hence any smoothing technique will dampen local extremes. To address this issue, we create a new probabilistic gridded product specifically designed to characterize the climatological properties of extreme precipitation by applying spatial statistical analyses to daily measurements of precipitation from the GHCN over CONUS. The essence of our method is to first estimate the climatology of extreme precipitation based on station data and then use a data-driven statistical approach to interpolate these estimates to a fine grid. We argue that our method yields an improved characterization of the climatology within a grid cell because the probabilistic behavior of extreme precipitation is much better behaved (i.e., smoother) than daily weather. Furthermore, the spatial smoothing innate to our approach significantly increases the signal-to-noise ratio in the estimated extreme statistics relative to an analysis without smoothing. Finally, by deriving a data-driven approach for translating extreme statistics to a spatially complete grid, the methodology outlined in this paper resolves the issue of how to properly compare station data with output from earth system models. We conclude the paper by comparing our probabilistic gridded product with a standard extreme value analysis of the Livneh gridded daily precipitation product.