No Arabic abstract
Because of its important role in health policy-shaping, population health monitoring (PHM) is considered a fundamental block for public health services. However, traditional public health data collection approaches, such as clinic-visit-based data integration or health surveys, could be very costly and time-consuming. To address this challenge, this paper proposes a cost-effective approach called Compressive Population Health (CPH), where a subset of a given area is selected in terms of regions within the area for data collection in the traditional way, while leveraging inherent spatial correlations of neighboring regions to perform data inference for the rest of the area. By alternating selected regions longitudinally, this approach can validate and correct previously assessed spatial correlations. To verify whether the idea of CPH is feasible, we conduct an in-depth study based on spatiotemporal morbidity rates of chronic diseases in more than 500 regions around London for over ten years. We introduce our CPH approach and present three extensive analytical studies. The first confirms that significant spatiotemporal correlations do exist. In the second study, by deploying multiple state-of-the-art data recovery algorithms, we verify that these spatiotemporal correlations can be leveraged to do data inference accurately using only a small number of samples. Finally, we compare different methods for region selection for traditional data collection and show how such methods can further reduce the overall cost while maintaining high PHM quality.
Recent advances in linguistic steganalysis have successively applied CNNs, RNNs, GNNs and other deep learning models for detecting secret information in generative texts. These methods tend to seek stronger feature extractors to achieve higher steganalysis effects. However, we have found through experiments that there actually exists significant difference between automatically generated steganographic texts and carrier texts in terms of the conditional probability distribution of individual words. Such kind of statistical difference can be naturally captured by the language model used for generating steganographic texts, which drives us to give the classifier a priori knowledge of the language model to enhance the steganalysis ability. To this end, we present two methods to efficient linguistic steganalysis in this paper. One is to pre-train a language model based on RNN, and the other is to pre-train a sequence autoencoder. Experimental results show that the two methods have different degrees of performance improvement when compared to the randomly initialized RNN classifier, and the convergence speed is significantly accelerated. Moreover, our methods have achieved the best detection results.
Obtaining the ability to make informed decisions regarding the operation and maintenance of structures, provides a major incentive for the implementation of structural health monitoring (SHM) systems. Probabilistic risk assessment (PRA) is an established methodology that allows engineers to make risk-informed decisions regarding the design and operation of safety-critical and high-value assets in industries such as nuclear and aerospace. The current paper aims to formulate a risk-based decision framework for structural health monitoring that combines elements of PRA with the existing SHM paradigm. As an apt tool for reasoning and decision-making under uncertainty, probabilistic graphical models serve as the foundation of the framework. The framework involves modelling failure modes of structures as Bayesian network representations of fault trees and then assigning costs or utilities to the failure events. The fault trees allow for information to pass from probabilistic classifiers to influence diagram representations of decision processes whilst also providing nodes within the graphical model that may be queried to obtain marginal probability distributions over local damage states within a structure. Optimal courses of action for structures are selected by determining the strategies that maximise expected utility. The risk-based framework is demonstrated on a realistic truss-like structure and supported by experimental data. Finally, a discussion of the risk-based approach is made and further challenges pertaining to decision-making processes in the context of SHM are identified.
DNS is a vital component for almost every networked application. Originally it was designed as an unencrypted protocol, making user security a concern. DNS-over-HTTPS (DoH) is the latest proposal to make name resolution more secure. In this paper we study the current DNS-over-HTTPS ecosystem, especially the cost of the additional security. We start by surveying the current DoH landscape by assessing standard compliance and supported features of public DoH servers. We then compare different transports for secure DNS, to highlight the improvements DoH makes over its predecessor, DNS-over-TLS (DoT). These improvements explain in part the significantly larger take-up of DoH in comparison to DoT. Finally, we quantify the overhead incurred by the additional layers of the DoH transport and their impact on web page load times. We find that these overheads only have limited impact on page load times, suggesting that it is possible to obtain the improved security of DoH with only marginal performance impact.
Leveraging health administrative data (HAD) datasets for predicting the risk of chronic diseases including diabetes has gained a lot of attention in the machine learning community recently. In this paper, we use the largest health records datasets of patients in Ontario,Canada. Provided by the Institute of Clinical Evaluative Sciences (ICES), this database is age, gender and ethnicity-diverse. The datasets include demographics, lab measurements,drug benefits, healthcare system interactions, ambulatory and hospitalizations records. We perform one of the first large-scale machine learning studies with this data to study the task of predicting diabetes in a range of 1-10 years ahead, which requires no additional screening of individuals.In the best setup, we reach a test AUC of 80.3 with a single-model trained on an observation window of 5 years with a one-year buffer using all datasets. A subset of top 15 features alone (out of a total of 963) could provide a test AUC of 79.1. In this paper, we provide extensive machine learning model performance and feature contribution analysis, which enables us to narrow down to the most important features useful for diabetes forecasting. Examples include chronic conditions such as asthma and hypertension, lab results, diagnostic codes in insurance claims, age and geographical information.
Evidence for Action (E4A), a signature program of the Robert Wood Johnson Foundation, funds investigator-initiated research on the impacts of social programs and policies on population health and health inequities. Across thousands of letters of intent and full proposals E4A has received since 2015, one of the most common methodological challenges faced by applicants is selecting realistic effect sizes to inform power and sample size calculations. E4A prioritizes health studies that are both (1) adequately powered to detect effect sizes that may reasonably be expected for the given intervention and (2) likely to achieve intervention effects sizes that, if demonstrated, correspond to actionable evidence for population health stakeholders. However, little guidance exists to inform the selection of effect sizes for population health research proposals. We draw on examples of five rigorously evaluated population health interventions. These examples illustrate considerations for selecting realistic and actionable effect sizes as inputs to power and sample size calculations for research proposals to study population health interventions. We show that plausible effects sizes for population health inteventions may be smaller than commonly cited guidelines suggest. Effect sizes achieved with population health interventions depend on the characteristics of the intervention, the target population, and the outcomes studied. Population health impact depends on the proportion of the population receiving the intervention. When adequately powered, even studies of interventions with small effect sizes can offer valuable evidence to inform population health if such interventions can be implemented broadly. Demonstrating the effectiveness of such interventions, however, requires large sample sizes.