ترغب بنشر مسار تعليمي؟ اضغط هنا

Dataset Bias in the Natural Sciences: A Case Study in Chemical Reaction Prediction and Synthesis Design

61   0   0.0 ( 0 )
 نشر من قبل Ryan-Rhys Griffiths
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

Datasets in the Natural Sciences are often curated with the goal of aiding scientific understanding and hence may not always be in a form that facilitates the application of machine learning. In this paper, we identify three trends within the fields of chemical reaction prediction and synthesis design that require a change in direction. First, the manner in which reaction datasets are split into reactants and reagents encourages testing models in an unrealistically generous manner. Second, we highlight the prevalence of mislabelled data, and suggest that the focus should be on outlier removal rather than data fitting only. Lastly, we discuss the problem of reagent prediction, in addition to reactant prediction, in order to solve the full synthesis design problem, highlighting the mismatch between what machine learning solves and what a lab chemist would need. Our critiques are also relevant to the burgeoning field of using machine learning to accelerate progress in experimental Natural Sciences, where datasets are often split in a biased way, are highly noisy, and contextual variables that are not evident from the data strongly influence the outcome of experiments.



قيم البحث

اقرأ أيضاً

We present a large-scale study of gender bias in occupation classification, a task where the use of machine learning may lead to negative outcomes on peoples lives. We analyze the potential allocation harms that can result from semantic representatio n bias. To do so, we study the impact on occupation classification of including explicit gender indicators---such as first names and pronouns---in different semantic representations of online biographies. Additionally, we quantify the bias that remains when these indicators are scrubbed, and describe proxy behavior that occurs in the absence of explicit gender indicators. As we demonstrate, differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances.
51 - Edouard Ribes 2017
This paper illustrates the similarities between the problems of customer churn and employee turnover. An example of employee turnover prediction model leveraging classical machine learning techniques is developed. Model outputs are then discussed to design & test employee retention policies. This type of retention discussion is, to our knowledge, innovative and constitutes the main value of this paper.
A wide variety of real life complex networks are prohibitively large for modeling, analysis and control. Understanding the structure and dynamics of such networks entails creating a smaller representative network that preserves its relevant topologic al and dynamical properties. While modern machine learning methods have enabled identification of governing laws for complex dynamical systems, their inability to produce white-box models with sufficient physical interpretation renders such methods undesirable to domain experts. In this paper, we introduce a hybrid black-box, white-box approach for the sparse identification of the governing laws for complex, highly coupled dynamical systems with particular emphasis on finding the influential reactions in chemical reaction networks for combustion applications, using a data-driven sparse-learning technique. The proposed approach identifies a set of influential reactions using species concentrations and reaction rates,with minimal computational cost without requiring additional data or simulations. The new approach is applied to analyze the combustion chemistry of H2 and C3H8 in a constant-volume homogeneous reactor. The influential reactions determined by the sparse-learning method are consistent with the current kinetics knowledge of chemical mechanisms. Additionally, we show that a reduced version of the parent mechanism can be generated as a combination of the significantly reduced influential reactions identified at different times and conditions and that for both H2 and C3H8 fuel, the reduced mechanisms perform closely to the parent mechanisms as a function of the ignition delay time over a wide range of conditions. Our results demonstrate the potential of the sparse-learning approach as an effective and efficient tool for dynamical system analysis and reduction. The uniqueness of this approach as applied to combustion systems lies in the ability to identify influential reactions in specified conditions and times during the evolution of the combustion process. This ability is of great interest to understand chemical reaction systems.
As modern deep networks become more complex, and get closer to human-like capabilities in certain domains, the question arises of how the representations and decision rules they learn compare to the ones in humans. In this work, we study representati ons of sentences in one such artificial system for natural language processing. We first present a diagnostic test dataset to examine the degree of abstract composable structure represented. Analyzing performance on these diagnostic tests indicates a lack of systematicity in the representations and decision rules, and reveals a set of heuristic strategies. We then investigate the effect of the training distribution on learning these heuristic strategies, and study changes in these representations with various augmentations to the training set. Our results reveal parallels to the analogous representations in people. We find that these systems can learn abstract rules and generalize them to new contexts under certain circumstances -- similar to human zero-shot reasoning. However, we also note some shortcomings in this generalization behavior -- similar to human judgment errors like belief bias. Studying these parallels suggests new ways to understand psychological phenomena in humans as well as informs best strategies for building artificial intelligence with human-like language understanding.
In this study, we analyze how changes in the geometry of a potential energy surface in terms of depth and flatness can affect the reaction dynamics. We formulate depth and flatness in the context of one and two degree-of-freedom (DOF) Hamiltonian nor mal form for the saddle-node bifurcation and quantify their influence on chemical reaction dynamics. In a recent work, Garcia-Garrido, Naik, and Wiggins illustrated how changing the well-depth of a potential energy surface (PES) can lead to a saddle-node bifurcation. They have shown how the geometry of cylindrical manifolds associated with the rank-1 saddle changes en route to the saddle-node bifurcation. Using the formulation presented here, we show how changes in the parameters of the potential energy control the depth and flatness and show their role in the quantitative measures of a chemical reaction. We quantify this role of the depth and flatness by calculating the ratio of the bottleneck-width and well-width, reaction probability (also known as transition fraction or population fraction), gap time (or first passage time) distribution, and directional flux through the dividing surface (DS) for small to high values of total energy. The results obtained for these quantitative measures are in agreement with the qualitative understanding of the reaction dynamics.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا