BEDS-Bench: Behavior of EHR-models under Distributional Shift--A Benchmark

191 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Anand Avati

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Anand Avati - Martin Seneviratne - Emily Xue

التعلم الآلي أجهزة الكمبيوتر والمجتمع

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Machine learning has recently demonstrated impressive progress in predictive accuracy across a wide array of tasks. Most ML approaches focus on generalization performance on unseen data that are similar to the training data (In-Distribution, or IND). However, real world applications and deployments of ML rarely enjoy the comfort of encountering examples that are always IND. In such situations, most ML models commonly display erratic behavior on Out-of-Distribution (OOD) examples, such as assigning high confidence to wrong predictions, or vice-versa. Implications of such unusual model behavior are further exacerbated in the healthcare setting, where patient health can potentially be put at risk. It is crucial to study the behavior and robustness properties of models under distributional shift, understand common failure modes, and take mitigation steps before the model is deployed. Having a benchmark that shines light upon these aspects of a model is a first and necessary step in addressing the issue. Recent work and interest in increasing model robustness in OOD settings have focused more on image modality, while the Electronic Health Record (EHR) modality is still largely under-explored. We aim to bridge this gap by releasing BEDS-Bench, a benchmark for quantifying the behavior of ML models over EHR data under OOD settings. We use two open access, de-identified EHR datasets to construct several OOD data settings to run tests on, and measure relevant metrics that characterize crucial aspects of a models OOD behavior. We evaluate several learning algorithms under BEDS-Bench and find that all of them show poor generalization performance under distributional shift in general. Our results highlight the need and the potential to improve robustness of EHR models under distributional shift, and BEDS-Bench provides one way to measure progress towards that goal.

قيم البحث

255 - Harvineet Singh , Rina Singh , Vishwali Mhasawade 2019

We study the problem of learning fair prediction models for unseen test sets distributed differently from the train set. Stability against changes in data distribution is an important mandate for responsible deployment of models. The domain adaptatio n literature addresses this concern, albeit with the notion of stability limited to that of prediction accuracy. We identify sufficient conditions under which stable models, both in terms of prediction accuracy and fairness, can be learned. Using the causal graph describing the data and the anticipated shifts, we specify an approach based on feature selection that exploits conditional independencies in the data to estimate accuracy and fairness metrics for the test set. We show that for specific fairness definitions, the resulting model satisfies a form of worst-case optimality. In context of a healthcare task, we illustrate the advantages of the approach in making more equitable decisions.

التعلم الآلي أجهزة الكمبيوتر والمجتمع التعلم الالي

Behavior of confined granular beds under cyclic thermal loading

164 - Pavel S. Iliev , Elena Giacomazzi , Falk K. Wittel 2019

We investigate the mechanical behavior of a confined granular packing of irregular polyhedral particles under repeated heating and cooling cycles by means of numerical simulations with the Non-Smooth Contact Dynamics method. Assuming a homogeneous te mperature distribution as well as constant temperature rate, we study the effect of the container shape, and coefficients of thermal expansions on the pressure buildup at the confining walls and the density evolution. We observe that small changes in the opening angle of the confinement can lead to a drastic peak pressure reduction. Furthermore, the displacement fields over several thermal cycles are obtained and we discover the formation of convection cells inside the granular material having the shape of a torus. The root mean square of the vorticity is then calculated from the displacement fields and a quadratic dependency on the ratio of thermal expansion coefficients is established.

مادة مكثفة ناعمة

Modelling EHR timeseries by restricting feature interaction

50 - Kun Zhang , Yuan Xue , Gerardo Flores 2019

Time series data are prevalent in electronic health records, mostly in the form of physiological parameters such as vital signs and lab tests. The patterns of these values may be significant indicators of patients clinical states and there might be p atterns that are unknown to clinicians but are highly predictive of some outcomes. Many of these values are also missing which makes it difficult to apply existing methods like decision trees. We propose a recurrent neural network model that reduces overfitting to noisy observations by limiting interactions between features. We analyze its performance on mortality, ICD-9 and AKI prediction from observational values on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset. Our models result in an improvement of 1.1% [p<0.01] in AU-ROC for mortality prediction under the MetaVision subset and 1.0% and 2.2% [p<0.01] respectively for mortality and AKI under the full MIMIC-III dataset compared to existing state-of-the-art interpolation, embedding and decay-based recurrent models.

التعلم الآلي أجهزة الكمبيوتر والمجتمع التعلم الالي

Evaluating Predictive Uncertainty under Distributional Shift on Dialogue Dataset

86 - Nyoungwoo Lee , ChaeHun Park , Ho-Jin Choi 2021

In open-domain dialogues, predictive uncertainties are mainly evaluated in a domain shift setting to cope with out-of-distribution inputs. However, in real-world conversations, there could be more extensive distributional shifted inputs than the out- of-distribution. To evaluate this, we first propose two methods, Unknown Word (UW) and Insufficient Context (IC), enabling gradual distributional shifts by corruption on the dialogue dataset. We then investigate the effect of distributional shifts on accuracy and calibration. Our experiments show that the performance of existing uncertainty estimation methods consistently degrades with intensifying the shift. The results suggest that the proposed methods could be useful for evaluating the calibration of dialogue systems under distributional shifts.

الحساب واللغة

Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance

209 - Elliott Slaughter , Wei Wu , Yuankun Fu 2019

We present Task Bench, a parameterized benchmark designed to explore the performance of parallel and distributed programming systems under a variety of application scenarios. Task Bench lowers the barrier to benchmarking multiple programming systems by making the implementation for a given system orthogonal to the benchmarks themselves: every benchmark constructed with Task Bench runs on every Task Bench implementation. Furthermore, Task Benchs parameterization enables a wide variety of benchmark scenarios that distill the key characteristics of larger applications. We conduct a comprehensive study with implementations of Task Bench in 15 programming systems on up to 256 Haswell nodes of the Cori supercomputer. We introduce a novel metric, minimum effective task granularity to study the baseline runtime overhead of each system. We show that when running at scale, 100 {mu}s is the smallest granularity that even the most efficient systems can reliably support with current technologies. We also study each systems scalability, ability to hide communication and mitigate load imbalance.

النظم الموزعة والتوازية والحوسبة العنقودية