ترغب بنشر مسار تعليمي؟ اضغط هنا

Crystal Structure Representations for Machine Learning Models of Formation Energies

149   0   0.0 ( 0 )
 نشر من قبل O. Anatole von Lilienfeld
 تاريخ النشر 2015
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

We introduce and evaluate a set of feature vector representations of crystal structures for machine learning (ML) models of formation energies of solids. ML models of atomization energies of organic molecules have been successful using a Coulomb matrix representation of the molecule. We consider three ways to generalize such representations to periodic systems: (i) a matrix where each element is related to the Ewald sum of the electrostatic interaction between two different atoms in the unit cell repeated over the lattice; (ii) an extended Coulomb-like matrix that takes into account a number of neighboring unit cells; and (iii) an Ansatz that mimics the periodicity and the basic features of the elements in the Ewald sum matrix by using a sine function of the crystal coordinates of the atoms. The representations are compared for a Laplacian kernel with Manhattan norm, trained to reproduce formation energies using a data set of 3938 crystal structures obtained from the Materials Project. For training sets consisting of 3000 crystals, the generalization error in predicting formation energies of new structures corresponds to (i) 0.49, (ii) 0.64, and (iii) 0.37 eV/atom for the respective representations.

قيم البحث

اقرأ أيضاً

Strategies for machine-learning(ML)-accelerated discovery that are general across materials composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets like open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (ca. 1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the periodic table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the effective nuclear charge alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data is limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the periodic table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the periodic table, a property we expect to be broadly useful for other materials domains.
The applications of machine learning techniques to chemistry and materials science become more numerous by the day. The main challenge is to devise representations of atomic systems that are at the same time complete and concise, so as to reduce the number of reference calculations that are needed to predict the properties of different types of materials reliably. This has led to a proliferation of alternative ways to convert an atomic structure into an input for a machine-learning model. We introduce an abstract definition of chemical environments that is based on a smoothed atomic density, using a bra-ket notation to emphasize basis set independence and to highlight the connections with some popular choices of representations for describing atomic systems. The correlations between the spatial distribution of atoms and their chemical identities are computed as inner products between these feature kets, which can be given an explicit representation in terms of the expansion of the atom density on orthogonal basis functions, that is equivalent to the smooth overlap of atomic positions (SOAP) power spectrum, but also in real space, corresponding to $n$-body correlations of the atom density. This formalism lays the foundations for a more systematic tuning of the behavior of the representations, by introducing operators that represent the correlations between structure, composition, and the target properties. It provides a unifying picture of recent developments in the field and indicates a way forward towards more effective and computationally affordable machine-learning schemes for molecules and materials.
Continuum solvation models enable efficient first principles calculations of chemical reactions in solution, but require extensive parametrization and fitting for each solvent and class of solute systems. Here, we examine the assumptions of continuum solvation models in detail and replace empirical terms with physical models in order to construct a minimally-empirical solvation model. Specifically, we derive solvent radii from the nonlocal dielectric response of the solvent from ab initio calculations, construct a closed-form and parameter-free weighted-density approximation for the free energy of the cavity formation, and employ a pair-potential approximation for the dispersion energy. We show that the resulting model with a single solvent-independent parameter: the electron density threshold ($n_c$), and a single solvent-dependent parameter: the dispersion scale factor ($s_6$), reproduces solvation energies of organic molecules in water, chloroform and carbon tetrachloride with RMS errors of 1.1, 0.6 and 0.5 kcal/mol respectively. We additionally show that fitting the solvent-dependent $s_6$ parameter to the solvation energy of a single non-polar molecule does not substantially increase these errors. Parametrization of this model for other solvents, therefore, requires minimal effort and is possible without extensive databases of experimental solvation free energies.
Elpasolite is the predominant quaternary crystal structure (AlNaK$_2$F$_6$ prototype) reported in the Inorganic Crystal Structure Database. We have developed a machine learning model to calculate density functional theory quality formation energies o f all $sim$2 M pristine ABC$_2$D$_6$ elpasolite crystals which can be made up from main-group elements (up to bismuth). Our models accuracy can be improved systematically, reaching 0.1 eV/atom for a training set consisting of 10 k crystals. Important bonding trends are revealed, fluoride is best suited to fit the coordination of the D site which lowers the formation energy whereas the opposite is found for carbon. The bonding contribution of elements A and B is very small on average. Low formation energies result from A and B being late elements from group (II), C being a late (I) element, and D being fluoride. Out of 2 M crystals, 90 unique structures are predicted to be on the convex hull---among which NFAl$_2$Ca$_6$, with peculiar stoichiometry and a negative atomic oxidation state for Al.
The HyChem approach has recently been proposed for modeling high-temperature combustion of real, multi-component fuels. The approach combines lumped reaction steps for fuel thermal and oxidative pyrolysis with detailed chemistry for the oxidation of the resulting pyrolysis products. However, the approach usually shows substantial discrepancies with experimental data within the Negative Temperature Coefficient (NTC) regime, as the low-temperature chemistry is more fuel-specific than high-temperature chemistry. This paper proposes a machine learning approach to learn the HyChem models that can cover both high-temperature and low-temperature regimes. Specifically, we develop a HyChem model using the experimental datasets of ignition delay times covering a wide range of temperatures and equivalence ratios. The chemical kinetic model is treated as a neural network model, and we then employ stochastic gradient descent (SGD), a technique that was developed for deep learning, for the training. We demonstrate the approach in learning the HyChem model for F-24, which is a Jet-A derived fuel, and compare the results with previous work employing genetic algorithms. The results show that the SGD approach can achieve comparable model performance with genetic algorithms but the computational cost is reduced by 1000 times. In addition, with regularization in SGD, the SGD approach changes the kinetic parameters from their original values much less than genetic algorithm and is thus more likely to retrain mechanistic meanings. Finally, our approach is built upon open-source packages and can be applied to the development and optimization of chemical kinetic models for internal combustion engine simulations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا