Controlled abstention neural networks for identifying skillful predictions for regression problems

99 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Elizabeth Barnes

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية فيزياء

والبحث باللغة English

تأليف Elizabeth A. Barnes - Randal J. Barnes

التعلم الآلي الفيزياء الجوية والمحيطية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predictable behavior than others, often termed forecasts of opportunity. When these opportunities are not present, scientists need prediction systems that are capable of saying I dont know. We introduce a novel loss function, termed abstention loss, that allows neural networks to identify forecasts of opportunity for regression problems. The abstention loss works by incorporating uncertainty in the networks prediction to identify the more confident samples and abstain (say I dont know) on the less confident samples. The abstention loss is designed to determine the optimal abstention fraction, or abstain on a user-defined fraction via a PID controller. Unlike many methods for attaching uncertainty to neural network predictions post-training, the abstention loss is applied during training to preferentially learn from the more confident samples. The abstention loss is built upon a standard computer science method. While the standard approach is itself a simple yet powerful tool for incorporating uncertainty in regression problems, we demonstrate that the abstention loss outperforms this more standard method for the synthetic climate use cases explored here. The implementation of proposed loss function is straightforward in most network architectures designed for regression, as it only requires modification of the output layer and loss function.

قيم البحث

93 - Elizabeth A. Barnes , Randal J. Barnes 2021

The earth system is exceedingly complex and often chaotic in nature, making prediction incredibly challenging: we cannot expect to make perfect predictions all of the time. Instead, we look for specific states of the system that lead to more predicta ble behavior than others, often termed forecasts of opportunity. When these opportunities are not present, scientists need prediction systems that are capable of saying I dont know. We introduce a novel loss function, termed the NotWrong loss, that allows neural networks to identify forecasts of opportunity for classification problems. The NotWrong loss introduces an abstention class that allows the network to identify the more confident samples and abstain (say I dont know) on the less confident samples. The abstention loss is designed to abstain on a user-defined fraction of the samples via a PID controller. Unlike many machine learning methods used to reject samples post-training, the NotWrong loss is applied during training to preferentially learn from the more confident samples. We show that the NotWrong loss outperforms other existing loss functions for multiple climate use cases. The implementation of the proposed loss function is straightforward in most network architectures designed for classification as it only requires the addition of an abstention class to the output layer and modification of the loss function.

الفيزياء الجوية والمحيطية التعلم الآلي

Identifying Opportunities for Skillful Weather Prediction with Interpretable Neural Networks

98 - Elizabeth A. Barnes , Kirsten Mayer , Benjamin Toms 2020

The atmosphere is chaotic. This fundamental property of the climate system makes forecasting weather incredibly challenging: its impossible to expect weather models to ever provide perfect predictions of the Earth system beyond timescales of approxim ately 2 weeks. Instead, atmospheric scientists look for specific states of the climate system that lead to more predictable behaviour than others. Here, we demonstrate how neural networks can be used, not only to leverage these states to make skillful predictions, but moreover to identify the climatic conditions that lead to enhanced predictability. Furthermore, we employ a neural network interpretability method called ``layer-wise relevance propagation to create heatmaps of the regions in the input most relevant for a networks output. For Earth scientists, these relevant regions for the neural networks prediction are by far the most important product of our study: they provide scientific insight into the physical mechanisms that lead to enhanced weather predictability. While we demonstrate our approach for the atmospheric science domain, this methodology is applicable to a large range of geoscientific problems.

الفيزياء الجوية والمحيطية

Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems

137 - Tianle Cai , Ruiqi Gao , Jikai Hou 2019

First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the prohibitive comp utational cost in calculating the second-order information. In this paper, we propose a novel Gram-Gauss-Newton (GGN) algorithm to train deep neural networks for regression problems with square loss. Our method draws inspiration from the connection between neural network optimization and kernel regression of neural tangent kernel (NTK). Different from typical second-order methods that have heavy computational cost in each iteration, GGN only has minor overhead compared to first-order methods such as SGD. We also give theoretical results to show that for sufficiently wide neural networks, the convergence rate of GGN is emph{quadratic}. Furthermore, we provide convergence guarantee for mini-batch GGN algorithm, which is, to our knowledge, the first convergence result for the mini-batch version of a second-order method on overparameterized neural networks. Preliminary experiments on regression tasks demonstrate that for training standard networks, our GGN algorithm converges much faster and achieves better performance than SGD.

التعلم الآلي التحسين والتحكم التعلم الالي

Interpretable Multi-Task Deep Neural Networks for Dynamic Predictions of Postoperative Complications

94 - Benjamin Shickel , Tyler J. Loftus , Shounak Datta 2020

Accurate prediction of postoperative complications can inform shared decisions between patients and surgeons regarding the appropriateness of surgery, preoperative risk-reduction strategies, and postoperative resource use. Traditional predictive anal ytic tools are hindered by suboptimal performance and usability. We hypothesized that novel deep learning techniques would outperform logistic regression models in predicting postoperative complications. In a single-center longitudinal cohort of 43,943 adult patients undergoing 52,529 major inpatient surgeries, deep learning yielded greater discrimination than logistic regression for all nine complications. Predictive performance was strongest when leveraging the full spectrum of preoperative and intraoperative physiologic time-series electronic health record data. A single multi-task deep learning model yielded greater performance than separate models trained on individual complications. Integrated gradients interpretability mechanisms demonstrated the substantial importance of missing data. Interpretable, multi-task deep neural networks made accurate, patient-level predictions that harbor the potential to augment surgical decision-making.

التعلم الآلي التعلم الالي

Efficient Gaussian Neural Processes for Regression

126 - Stratis Markou , James Requeima , Wessel Bruinsma 2021

Conditional Neural Processes (CNP; Garnelo et al., 2018) are an attractive family of meta-learning models which produce well-calibrated predictions, enable fast inference at test time, and are trainable via a simple maximum likelihood procedure. A li mitation of CNPs is their inability to model dependencies in the outputs. This significantly hurts predictive performance and renders it impossible to draw coherent function samples, which limits the applicability of CNPs in down-stream applications and decision making. Neural Processes (NPs; Garnelo et al., 2018) attempt to alleviate this issue by using latent variables, relying on these to model output dependencies, but introduces difficulties stemming from approximate inference. One recent alternative (Bruinsma et al.,2021), which we refer to as the FullConvGNP, models dependencies in the predictions while still being trainable via exact maximum-likelihood. Unfortunately, the FullConvGNP relies on expensive 2D-dimensional convolutions, which limit its applicability to only one-dimensional data. In this work, we present an alternative way to model output dependencies which also lends itself maximum likelihood training but, unlike the FullConvGNP, can be scaled to two- and three-dimensional data. The proposed models exhibit good performance in synthetic experiments.

التعلم الآلي التعلم الالي