ترغب بنشر مسار تعليمي؟ اضغط هنا

Machine Learning Prediction of Accurate Atomization Energies of Organic Molecules from Low-Fidelity Quantum Chemical Calculations

71   0   0.0 ( 0 )
 نشر من قبل Logan Ward
 تاريخ النشر 2019
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

Recent studies illustrate how machine learning (ML) can be used to bypass a core challenge of molecular modeling: the tradeoff between accuracy and computational cost. Here, we assess multiple ML approaches for predicting the atomization energy of organic molecules. Our resulting models learn the difference between low-fidelity, B3LYP, and high-accuracy, G4MP2, atomization energies, and predict the G4MP2 atomization energy to 0.005 eV (mean absolute error) for molecules with less than 9 heavy atoms and 0.012 eV for a small set of molecules with between 10 and 14 heavy atoms. Our two best models, which have different accuracy/speed tradeoffs, enable the efficient prediction of G4MP2-level energies for large molecules and are available through a simple web interface.



قيم البحث

اقرأ أيضاً

Computational study of molecules and materials from first principles is a cornerstone of physics, chemistry, and materials science, but limited by the cost of accurate and precise simulations. In settings involving many simulations, machine learning can reduce these costs, often by orders of magnitude, by interpolating between reference simulations. This requires representations that describe any molecule or material and support interpolation. We comprehensively review and discuss current representations and relations between them, using a unified mathematical framework based on many-body functions, group averaging, and tensor products. For selected state-of-the-art representations, we compare energy predictions for organic molecules, binary alloys, and Al-Ga-In sesquioxides in numerical experiments controlled for data distribution, regression method, and hyper-parameter optimization.
Machine learning of atomic-scale properties is revolutionizing molecular modelling, making it possible to evaluate inter-atomic potentials with first-principles accuracy, at a fraction of the costs. The accuracy, speed and reliability of machine-lear ning potentials, however, depends strongly on the way atomic configurations are represented, i.e. the choice of descriptors used as input for the machine learning method. The raw Cartesian coordinates are typically transformed in fingerprints, or symmetry functions, that are designed to encode, in addition to the structure, important properties of the potential-energy surface like its invariances with respect to rotation, translation and permutation of like atoms. Here we discuss automatic protocols to select a number of fingerprints out of a large pool of candidates, based on the correlations that are intrinsic to the training data. This procedure can greatly simplify the construction of neural network potentials that strike the best balance between accuracy and computational efficiency, and has the potential to accelerate by orders of magnitude the evaluation of Gaussian Approximation Potentials based on the Smooth Overlap of Atomic Positions kernel. We present applications to the construction of neural network potentials for water and for an Al-Mg-Si alloy, and to the prediction of the formation energies of small organic molecules using Gaussian process regression.
288 - Chi Chen , Zhi Deng , Richard Tran 2017
In this work, we present a highly accurate spectral neighbor analysis potential (SNAP) model for molybdenum (Mo) developed through the rigorous application of machine learning techniques on large materials data sets. Despite Mos importance as a struc tural metal, existing force fields for Mo based on the embedded atom and modified embedded atom methods still do not provide satisfactory accuracy on many properties. We will show that by fitting to the energies, forces and stress tensors of a large density functional theory (DFT)-computed dataset on a diverse set of Mo structures, a Mo SNAP model can be developed that achieves close to DFT accuracy in the prediction of a broad range of properties, including energies, forces, stresses, elastic constants, melting point, phonon spectra, surface energies, grain boundary energies, etc. We will outline a systematic model development process, which includes a rigorous approach to structural selection based on principal component analysis, as well as a differential evolution algorithm for optimizing the hyperparameters in the model fitting so that both the model error and the property prediction error can be simultaneously lowered. We expect that this newly developed Mo SNAP model will find broad applications in large-scale, long-time scale simulations.
Based on the first-principles calculations, we perform an initiatory statistical assessment on the reliability level of theoretical positron lifetime of bulk material. We found the original generalized gradient approximation (GGA) form of the enhance ment factor and correlation potentials overestimates the effect of the gradient factor. Furthermore, an excellent agreement between model and data with the difference being the noise level of the data is found in this work. In addition, we suggest a new GGA form of the correlation scheme which gives the best performance. This work demonstrates that a brand-new reliability level is achieved for the theoretical prediction on positron lifetime of bulk material and the accuracy of the best theoretical scheme can be independent on the type of materials.
A top-level designed forecasting system for predicting computational times of density-functional theory (DFT)/time-dependent density-functional theory (TDDFT) calculations is presented. The computational time is assumed as the intrinsic property for the molecule. Basing on this assumption, the forecasting system is established using the reinforced concrete, which combines the cheminformatics, several machine-learning (ML) models, and the framework of many-world interpretation (MWI) in multiverse ansatz. Herein, the cheminformatics is used to recognize the topological structure of molecules, the ML/AI models are used to build the relationships between topology and computational cost, and the MWI framework is used to hold various combinations of DFT functionals and basis sets in DFT/TDDFT calculations. Calculated results of molecules from DrugBank dataset show that 1) it can give quantitative predictions of computational costs, typical mean relative errors can be less than 0.2 for DFT/TDDFT calculations with derivations of 25% using the exactly pre-trained ML models, 2) it can also be employed to various combinations of DFT functional and basis set cases without exactly pre-trained ML models, while only slightly enlarge predicting errors.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا