No Arabic abstract
The electronic charge density plays a central role in determining the behavior of matter at the atomic scale, but its computational evaluation requires demanding electronic-structure calculations. We introduce an atom-centered, symmetry-adapted framework to machine-learn the valence charge density based on a small number of reference calculations. The model is highly transferable, meaning it can be trained on electronic-structure data of small molecules and used to predict the charge density of larger compounds with low, linear-scaling cost. Applications are shown for various hydrocarbon molecules of increasing complexity and flexibility, and demonstrate the accuracy of the model when predicting the density on octane and octatetraene after training exclusively on butane and butadiene. This transferable, data-driven model can be used to interpret experiments, initialize electronic structure calculations, and compute electrostatic interactions in molecules and condensed-phase systems.
Strategies for machine-learning(ML)-accelerated discovery that are general across materials composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets like open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (ca. 1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the periodic table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the effective nuclear charge alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data is limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the periodic table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the periodic table, a property we expect to be broadly useful for other materials domains.
Machine learning is a powerful tool to design accurate, highly non-local, exchange-correlation functionals for density functional theory. So far, most of those machine learned functionals are trained for systems with an integer number of particles. As such, they are unable to reproduce some crucial and fundamental aspects, such as the explicit dependency of the functionals on the particle number or the infamous derivative discontinuity at integer particle numbers. Here we propose a solution to these problems by training a neural network as the universal functional of density-functional theory that (i) depends explicitly on the number of particles with a piece-wise linearity between the integer numbers and (ii) reproduces the derivative discontinuity of the exchange-correlation energy. This is achieved by using an ensemble formalism, a training set containing fractional densities, and an explicitly discontinuous formulation.
The applications of machine learning techniques to chemistry and materials science become more numerous by the day. The main challenge is to devise representations of atomic systems that are at the same time complete and concise, so as to reduce the number of reference calculations that are needed to predict the properties of different types of materials reliably. This has led to a proliferation of alternative ways to convert an atomic structure into an input for a machine-learning model. We introduce an abstract definition of chemical environments that is based on a smoothed atomic density, using a bra-ket notation to emphasize basis set independence and to highlight the connections with some popular choices of representations for describing atomic systems. The correlations between the spatial distribution of atoms and their chemical identities are computed as inner products between these feature kets, which can be given an explicit representation in terms of the expansion of the atom density on orthogonal basis functions, that is equivalent to the smooth overlap of atomic positions (SOAP) power spectrum, but also in real space, corresponding to $n$-body correlations of the atom density. This formalism lays the foundations for a more systematic tuning of the behavior of the representations, by introducing operators that represent the correlations between structure, composition, and the target properties. It provides a unifying picture of recent developments in the field and indicates a way forward towards more effective and computationally affordable machine-learning schemes for molecules and materials.
A new empirical potential for efficient, large scale molecular dynamics simulation of water is presented. The HIPPO (Hydrogen-like Intermolecular Polarizable POtential) force field is based upon the model electron density of a hydrogen-like atom. This framework is used to derive and parameterize individual terms describing charge penetration damped permanent electrostatics, damped polarization, charge transfer, anisotropic Pauli repulsion, and damped dispersion interactions. Initial parameter values were fit to Symmetry Adapted Perturbation Theory (SAPT) energy components for ten water dimer configurations, as well as the radial and angular dependence of the canonical dimer. The SAPT-based parameters were then systematically refined to extend the treatment to water bulk phases. The final HIPPO water model provides a balanced representation of a wide variety of properties of gas phase clusters, liquid water and ice polymorphs, across a range of temperatures and pressures. This water potential yields a rationalization of water structure, dynamics and thermodynamics explicitly correlated with an ab initio energy decomposition, while providing a level of accuracy comparable or superior to previous polarizable atomic multipole force fields. The HIPPO water model serves as a cornerstone around which similarly detailed physics-based models can be developed for additional molecular species.
We address the degree to which machine learning can be used to accurately and transferably predict post-Hartree-Fock correlation energies. Refined strategies for feature design and selection are presented, and the molecular-orbital-based machine learning (MOB-ML) method is applied to several test systems. Strikingly, for the MP2, CCSD, and CCSD(T) levels of theory, it is shown that the thermally accessible (350 K) potential energy surface for a single water molecule can be described to within 1 millihartree using a model that is trained from only a single reference calculation at a randomized geometry. To explore the breadth of chemical diversity that can be described, MOB-ML is also applied to a new dataset of thermalized (350 K) geometries of 7211 organic models with up to seven heavy atoms. In comparison with the previously reported $Delta$-ML method, MOB-ML is shown to reach chemical accuracy with three-fold fewer training geometries. Finally, a transferability test in which models trained for seven-heavy-atom systems are used to predict energies for thirteen-heavy-atom systems reveals that MOB-ML reaches chemical accuracy with 36-fold fewer training calculations than $Delta$-ML (140 versus 5000 training calculations).