Machine Learning Approaches to Learn HyChem Models

69 0 0.0 ( 0 )

Download Cite

Added by Weiqi Ji

Publication date 2021

fields Physics

and research's language is English

Authors Weiqi Ji - Julian Zanders - Ji-Woong Park

Chemical Physics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The HyChem approach has recently been proposed for modeling high-temperature combustion of real, multi-component fuels. The approach combines lumped reaction steps for fuel thermal and oxidative pyrolysis with detailed chemistry for the oxidation of the resulting pyrolysis products. However, the approach usually shows substantial discrepancies with experimental data within the Negative Temperature Coefficient (NTC) regime, as the low-temperature chemistry is more fuel-specific than high-temperature chemistry. This paper proposes a machine learning approach to learn the HyChem models that can cover both high-temperature and low-temperature regimes. Specifically, we develop a HyChem model using the experimental datasets of ignition delay times covering a wide range of temperatures and equivalence ratios. The chemical kinetic model is treated as a neural network model, and we then employ stochastic gradient descent (SGD), a technique that was developed for deep learning, for the training. We demonstrate the approach in learning the HyChem model for F-24, which is a Jet-A derived fuel, and compare the results with previous work employing genetic algorithms. The results show that the SGD approach can achieve comparable model performance with genetic algorithms but the computational cost is reduced by 1000 times. In addition, with regularization in SGD, the SGD approach changes the kinetic parameters from their original values much less than genetic algorithm and is thus more likely to retrain mechanistic meanings. Finally, our approach is built upon open-source packages and can be applied to the development and optimization of chemical kinetic models for internal combustion engine simulations.

rate research

Crystal Structure Representations for Machine Learning Models of Formation Energies

589 - Felix Faber , Alexander Lindmaa , O. Anatole von Lilienfeld andn Rickard Armiento 2015

We introduce and evaluate a set of feature vector representations of crystal structures for machine learning (ML) models of formation energies of solids. ML models of atomization energies of organic molecules have been successful using a Coulomb matrix representation of the molecule. We consider three ways to generalize such representations to periodic systems: (i) a matrix where each element is related to the Ewald sum of the electrostatic interaction between two different atoms in the unit cell repeated over the lattice; (ii) an extended Coulomb-like matrix that takes into account a number of neighboring unit cells; and (iii) an Ansatz that mimics the periodicity and the basic features of the elements in the Ewald sum matrix by using a sine function of the crystal coordinates of the atoms. The representations are compared for a Laplacian kernel with Manhattan norm, trained to reproduce formation energies using a data set of 3938 crystal structures obtained from the Materials Project. For training sets consisting of 3000 crystals, the generalization error in predicting formation energies of new structures corresponds to (i) 0.49, (ii) 0.64, and (iii) 0.37 eV/atom for the respective representations.

Chemical Physics

Machine Learning Allows Calibration Models to Predict Trace Element Concentration in Soil with Generalized LIBS Spectra

63 - Chen Sun , Ye Tian , Liang Gao 2019

Calibration models have been developed for determination of trace elements, silver for instance, in soil using laser-induced breakdown spectroscopy (LIBS). The major concern is the matrix effect. Although it affects the accuracy of LIBS measurements in a general way, the effect appears accentuated for soil because of large variation of chemical and physical properties among different soils. The purpose is to reduce its influence in such way an accurate and soil-independent calibration model can be constructed. At the same time, the developed model should efficiently reduce experimental fluctuations affecting measurement precision. A univariate model first reveals obvious influence of matrix effect and important experimental fluctuation. A multivariate model has been then developed. A key point is the introduction of generalized spectrum where variables representing the soil type are explicitly included. Machine learning has been used to develop the model. After a necessary pretreatment where a feature selection process reduces the dimension of raw spectrum accordingly to the number of available spectra, the data have been fed in to a back-propagation neuronal networks (BPNN) to train and validate the model. The resulted soilindependent calibration model allows average relative error of calibration (REC) and average relative error of prediction (REP) within the range of 5-6%.

Chemical Physics Machine Learning Instrumentation and Detectors

Learning-to-Learn Personalised Human Activity Recognition Models

66 - Anjana Wijekoon , Nirmalie Wiratunga 2020

Human Activity Recognition~(HAR) is the classification of human movement, captured using one or more sensors either as wearables or embedded in the environment~(e.g. depth cameras, pressure mats). State-of-the-art methods of HAR rely on having access to a considerable amount of labelled data to train deep architectures with many train-able parameters. This becomes prohibitive when tasked with creating models that are sensitive to personal nuances in human movement, explicitly present when performing exercises. In addition, it is not possible to collect training data to cover all possible subjects in the target population. Accordingly, learning personalised models with few data remains an interesting challenge for HAR research. We present a meta-learning methodology for learning to learn personalised HAR models for HAR; with the expectation that the end-user need only provides a few labelled data but can benefit from the rapid adaptation of a generic meta-model. We introduce two algorithms, Personalised MAML and Personalised Relation Networks inspired by existing Meta-Learning algorithms but optimised for learning HAR models that are adaptable to any person in health and well-being applications. A comparative study shows significant performance improvements against the state-of-the-art Deep Learning algorithms and the Few-shot Meta-Learning algorithms in multiple HAR domains.

Computer Vision and Pattern Recognition Machine Learning

Why Machine Reading Comprehension Models Learn Shortcuts?

137 - Yuxuan Lai , Chen Zhang , Yansong Feng 2021

Recent studies report that many machine reading comprehension (MRC) models can perform closely to or even better than humans on benchmark datasets. However, existing works indicate that many MRC models may learn shortcuts to outwit these benchmarks, but the performance is unsatisfactory in real-world applications. In this work, we attempt to explore, instead of the expected comprehension skills, why these models learn the shortcuts. Based on the observation that a large portion of questions in current datasets have shortcut solutions, we argue that larger proportion of shortcut questions in training data make models rely on shortcut tricks excessively. To investigate this hypothesis, we carefully design two synthetic datasets with annotations that indicate whether a question can be answered using shortcut solutions. We further propose two new methods to quantitatively analyze the learning difficulty regarding shortcut and challenging questions, and revealing the inherent learning mechanism behind the different performance between the two kinds of questions. A thorough empirical analysis shows that MRC models tend to learn shortcut questions earlier than challenging questions, and the high proportions of shortcut questions in training sets hinder models from exploring the sophisticated reasoning skills in the later stage of training.

Computation and Language

Representations and Strategies for Transferable Machine Learning Models in Chemical Discovery

127 - Daniel R. Harper , Aditya Nandy , Naveen Arunachalam 2021

Strategies for machine-learning(ML)-accelerated discovery that are general across materials composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets like open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (ca. 1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the periodic table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the effective nuclear charge alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data is limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the periodic table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the periodic table, a property we expect to be broadly useful for other materials domains.

Chemical Physics Materials Science Machine Learning