No Arabic abstract
Recovery from the Flint Water Crisis has been hindered by uncertainty in both the water testing process and the causes of contamination. In this work, we develop an ensemble of predictive models to assess the risk of lead contamination in individual homes and neighborhoods. To train these models, we utilize a wide range of data sources, including voluntary residential water tests, historical records, and city infrastructure data. Additionally, we use our models to identify the most prominent factors that contribute to a high risk of lead contamination. In this analysis, we find that lead service lines are not the only factor that is predictive of the risk of lead contamination of water. These results could be used to guide the long-term recovery efforts in Flint, minimize the immediate damages, and improve resource-allocation decisions for similar water infrastructure crises.
Wildfire is one of the biggest disasters that frequently occurs on the west coast of the United States. Many efforts have been made to understand the causes of the increases in wildfire intensity and frequency in recent years. In this work, we propose static and dynamic prediction models to analyze and assess the areas with high wildfire risks in California by utilizing a multitude of environmental data including population density, Normalized Difference Vegetation Index (NDVI), Palmer Drought Severity Index (PDSI), tree mortality area, tree mortality number, and altitude. Moreover, we focus on a better understanding of the impacts of different factors so as to inform preventive actions. To validate our models and findings, we divide the land of California into 4,242 grids of 0.1 degrees $times$ 0.1 degrees in latitude and longitude, and compute the risk of each grid based on spatial and temporal conditions. To verify the generalizability of our models, we further expand the scope of wildfire risk assessment from California to Washington without any fine tuning. By performing counterfactual analysis, we uncover the effects of several possible methods on reducing the number of high risk wildfires. Taken together, our study has the potential to estimate, monitor, and reduce the risks of wildfires across diverse areas provided that such environment data is available.
Whereas maintenance has been recognized as an important and effective means for risk management in power systems, it turns out to be intractable if cascading blackout risk is considered due to the extremely high computational complexity. In this paper, based on the inference from the blackout simulation data, we propose a methodology to efficiently identify the most influential component(s) for mitigating cascading blackout risk in a large power system. To this end, we first establish an analytic relationship between maintenance strategies and blackout risk estimation by inferring from the data of cascading outage simulations. Then we formulate the component maintenance decision-making problem as a nonlinear 0-1 programming. Afterwards, we quantify the credibility of blackout risk estimation, leading to an adaptive method to determine the least required number of simulations, which servers as a crucial parameter of the optimization model. Finally, we devise two heuristic algorithms to find approximate optimal solutions to the model with very high efficiency. Numerical experiments well manifest the efficacy and high efficiency of our methodology.
Proximal gamma-ray spectroscopy supported by adequate calibration and correction for growing biomass is an effective field scale technique for a continuous monitoring of top soil water content dynamics to be potentially employed as a decision support tool for automatic irrigation scheduling. This study demonstrates that this approach has the potential to be one of the best space-time trade-off methods, representing a joining link between punctual and satellite fields of view. The inverse proportionality between soil moisture and gamma signal is theoretically derived taking into account a non-constant correction due to the presence of growing vegetation beneath the detector position. The gamma signal attenuation due to biomass is modelled with a Monte Carlo-based approach in terms of an equivalent water layer which thickness varies in time as the crop evolves during its life-cycle. The reliability and effectiveness of this approach is proved through a 7 months continuous acquisition of terrestrial gamma radiation in a 0.4 ha tomato (Solanum lycopersicum) test field. We demonstrate that a permanent gamma station installed at an agricultural field can reliably probe the water content of the top soil only if systematic effects due to the biomass shielding are properly accounted for. Biomass corrected experimental values of soil water content inferred from radiometric measurements are compared with gravimetric data acquired under different soil moisture levels, resulting in an average percentage relative discrepancy of about 3% in bare soil condition and of 4% during the vegetated period. The temporal evolution of corrected soil water content values exhibits a dynamic range coherent with the soil hydraulic properties in terms of wilting point, field capacity and saturation.
Extraneous variables are variables that are irrelevant for a certain task, but heavily affect the distribution of the available data. In this work, we show that the presence of such variables can degrade the performance of deep-learning models. We study three datasets where there is a strong influence of known extraneous variables: classification of upper-body movements in stroke patients, annotation of surgical activities, and recognition of corrupted images. Models trained with batch normalization learn features that are highly dependent on the extraneous variables. In batch normalization, the statistics used to normalize the features are learned from the training set and fixed at test time, which produces a mismatch in the presence of varying extraneous variables. We demonstrate that estimating the feature statistics adaptively during inference, as in instance normalization, addresses this issue, producing normalized features that are more robust to changes in the extraneous variables. This results in a significant gain in performance for different network architectures and choices of feature statistics.
Federated learning is a novel framework that enables resource-constrained edge devices to jointly learn a model, which solves the problem of data protection and data islands. However, standard federated learning is vulnerable to Byzantine attacks, which will cause the global model to be manipulated by the attacker or fail to converge. On non-iid data, the current methods are not effective in defensing against Byzantine attacks. In this paper, we propose a Byzantine-robust framework for federated learning via credibility assessment on non-iid data (BRCA). Credibility assessment is designed to detect Byzantine attacks by combing adaptive anomaly detection model and data verification. Specially, an adaptive mechanism is incorporated into the anomaly detection model for the training and prediction of the model. Simultaneously, a unified update algorithm is given to guarantee that the global model has a consistent direction. On non-iid data, our experiments demonstrate that the BRCA is more robust to Byzantine attacks compared with conventional methods