ترغب بنشر مسار تعليمي؟ اضغط هنا

Metadynamics sampling in atomic environment space for collecting training data for machine learning potentials

148   0   0.0 ( 0 )
 نشر من قبل Jisu Jung
 تاريخ النشر 2020
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

The universal mathematical form of machine-learning potentials (MLPs) shifts the core of development of interatomic potentials to collecting proper training data. Ideally, the training set should encompass diverse local atomic environments but the conventional approach is prone to sampling similar configurations repeatedly, mainly due to the Boltzmann statistics. As such, practitioners handpick a large pool of distinct configurations manually, stretching the development period significantly. Herein, we suggest a novel sampling method optimized for gathering diverse yet relevant configurations semi-automatically. This is achieved by applying the metadynamics with the descriptor for the local atomic environment as a collective variable. As a result, the simulation is automatically steered toward unvisited local environment space such that each atom experiences diverse chemical environments without redundancy. We apply the proposed metadynamics sampling to H:Pt(111), GeTe, and Si systems. Throughout the examples, a small number of metadynamics trajectories can provide reference structures necessary for training high-fidelity MLPs. By proposing a semi-automatic sampling method tuned for MLPs, the present work paves the way to wider applications of MLPs to many challenging applications.



قيم البحث

اقرأ أيضاً

Abstract Machine learning models, trained on data from ab initio quantum simulations, are yielding molecular dynamics potentials with unprecedented accuracy. One limiting factor is the quantity of available training data, which can be expensive to ob tain. A quantum simulation often provides all atomic forces, in addition to the total energy of the system. These forces provide much more information than the energy alone. It may appear that training a model to this large quantity of force data would introduce significant computational costs. Actually, training to all available force data should only be a few times more expensive than training to energies alone. Here, we present a new algorithm for efficient force training, and benchmark its accuracy by training to forces from real-world datasets for organic chemistry and bulk aluminum.
Machine learning of atomic-scale properties is revolutionizing molecular modelling, making it possible to evaluate inter-atomic potentials with first-principles accuracy, at a fraction of the costs. The accuracy, speed and reliability of machine-lear ning potentials, however, depends strongly on the way atomic configurations are represented, i.e. the choice of descriptors used as input for the machine learning method. The raw Cartesian coordinates are typically transformed in fingerprints, or symmetry functions, that are designed to encode, in addition to the structure, important properties of the potential-energy surface like its invariances with respect to rotation, translation and permutation of like atoms. Here we discuss automatic protocols to select a number of fingerprints out of a large pool of candidates, based on the correlations that are intrinsic to the training data. This procedure can greatly simplify the construction of neural network potentials that strike the best balance between accuracy and computational efficiency, and has the potential to accelerate by orders of magnitude the evaluation of Gaussian Approximation Potentials based on the Smooth Overlap of Atomic Positions kernel. We present applications to the construction of neural network potentials for water and for an Al-Mg-Si alloy, and to the prediction of the formation energies of small organic molecules using Gaussian process regression.
Faithfully representing chemical environments is essential for describing materials and molecules with machine learning approaches. Here, we present a systematic classification of these representations and then investigate: (i) the sensitivity to per turbations and (ii) the effective dimensionality of a variety of atomic environment representations, and over a range of material datasets. Representations investigated include Atom Centred Symmetry Functions, Chebyshev Polynomial Symmetry Functions (CHSF), Smooth Overlap of Atomic Positions, Many-body Tensor Representation and Atomic Cluster Expansion. In area (i), we show that none of the atomic environment representations are linearly stable under tangential perturbations, and that for CHSF there are instabilities for particular choices of perturbation, which we show can be removed with a slight redefinition of the representation. In area (ii), we find that most representations can be compressed significantly without loss of precision, and further that selecting optimal subsets of a representation method improves the accuracy of regression models built for a given dataset.
468 - Eun Seo Jo , Timnit Gebru 2019
A growing body of work shows that many problems in fairness, accountability, transparency, and ethics in machine learning systems are rooted in decisions surrounding the data collection and annotation process. In spite of its fundamental nature howev er, data collection remains an overlooked part of the machine learning (ML) pipeline. In this paper, we argue that a new specialization should be formed within ML that is focused on methodologies for data collection and annotation: efforts that require institutional frameworks and procedures. Specifically for sociocultural data, parallels can be drawn from archives and libraries. Archives are the longest standing communal effort to gather human information and archive scholars have already developed the language and procedures to address and discuss many challenges pertaining to data collection such as consent, power, inclusivity, transparency, and ethics & privacy. We discuss these five key approaches in document collection practices in archives that can inform data collection in sociocultural ML. By showing data collection practices from another field, we encourage ML research to be more cognizant and systematic in data collection and draw from interdisciplinary expertise.
Metadynamics is an enhanced sampling method of great popularity, based on the on-the-fly construction of a bias potential that is function of a selected number of collective variables. We propose here a change in perspective that shifts the focus fro m the bias to the probability distribution reconstruction, while keeping some of the key characteristics of metadynamics, such as the flexible on-the-fly adjustments to the free energy estimate. The result is an enhanced sampling method that presents a drastic improvement in convergence speed, especially when dealing with suboptimal and/or multidimensional sets of collective variables. The method is especially robust and easy to use, in fact it requires only few simple parameters to be set, and it has a straightforward reweighting scheme to recover the statistics of the unbiased ensemble. Furthermore it gives more control on the desired exploration of the phase space, since the deposited bias is not allowed to grow indefinitely and it does not push the simulation to uninteresting high free energy regions. We demonstrate the performance of the method in a number of representative examples.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا