No Arabic abstract
Physically-motivated and mathematically robust atom-centred representations of molecular structures are key to the success of modern atomistic machine learning (ML) methods. They lie at the foundation of a wide range of methods to predict the properties of both materials and molecules as well as to explore and visualize the chemical compound and configuration space. Recently, it has become clear that many of the most effective representations share a fundamental formal connection: that they can all be expressed as a discretization of N-body correlation functions of the local atom density, suggesting the opportunity of standardizing and, more importantly, optimizing the calculation of such representations. We present an implementation, named librascal, whose modular design lends itself both to developing refinements to the density-based formalism and to rapid prototyping for new developments of rotationally equivariant atomistic representations. As an example, we discuss SOAP features, perhaps the most widely used member of this family of representations, to show how the expansion of the local density can be optimized for any choice of radial basis set. We discuss the representation in the context of a kernel ridge regression model, commonly used with SOAP features, and analyze how the computational effort scales for each of the individual steps of the calculation. By applying data reduction techniques in feature space, we show how to further reduce the total computational cost by at up to a factor of 4 or 5 without affecting the models symmetry properties and without significantly impacting its accuracy.
The applications of machine learning techniques to chemistry and materials science become more numerous by the day. The main challenge is to devise representations of atomic systems that are at the same time complete and concise, so as to reduce the number of reference calculations that are needed to predict the properties of different types of materials reliably. This has led to a proliferation of alternative ways to convert an atomic structure into an input for a machine-learning model. We introduce an abstract definition of chemical environments that is based on a smoothed atomic density, using a bra-ket notation to emphasize basis set independence and to highlight the connections with some popular choices of representations for describing atomic systems. The correlations between the spatial distribution of atoms and their chemical identities are computed as inner products between these feature kets, which can be given an explicit representation in terms of the expansion of the atom density on orthogonal basis functions, that is equivalent to the smooth overlap of atomic positions (SOAP) power spectrum, but also in real space, corresponding to $n$-body correlations of the atom density. This formalism lays the foundations for a more systematic tuning of the behavior of the representations, by introducing operators that represent the correlations between structure, composition, and the target properties. It provides a unifying picture of recent developments in the field and indicates a way forward towards more effective and computationally affordable machine-learning schemes for molecules and materials.
We present an efficient implementation of the Bethe-Salpeter equation (BSE) method for obtaining core-level spectra including x-ray absorption (XAS), x-ray emission (XES), and both resonant and non-resonant inelastic x-ray scattering spectra (N/RIXS). Calculations are based on density functional theory (DFT) electronic structures generated either by abinit or Quantumespresso, both plane-wave basis, pseudopotential codes. This electronic structure is improved through the inclusion of a GW self energy. The projector augmented wave technique is used to evaluate transition matrix elements between core-level and band states. Final two-particle scattering states are obtained with the NIST core-level BSE solver (NBSE). We have previously reported this implementation, which we refer to as ocean (Obtaining Core Excitations from Ab initio electronic structure and NBSE) [Phys. Rev. B 83, 115106 (2011)]. Here, we present additional efficiencies that enable us to evaluate spectra for systems ten times larger than previously possible; containing up to a few thousand electrons. These improvements include the implementation of optimal basis functions that reduce the cost of the initial DFT calculations, more complete parallelization of the screening calculation and of the action of the BSE Hamiltonian, and various memory reductions. Scaling is demonstrated on supercells of SrTiO_3 and example spectra for the organic light emitting molecule Tris-(8-hydroxyquinoline)aluminum (Alq_3 ) are presented. The ability to perform large-scale spectral calculations is particularly advantageous for investigating dilute or non-periodic systems such as doped materials, amorphous systems, or complex nano-structures.
Many-body descriptors are widely used to represent atomic environments in the construction of machine learned interatomic potentials and more broadly for fitting, classification and embedding tasks on atomic structures. It was generally believed that 3-body descriptors uniquely specify the environment of an atom, up to a rotation and permutation of like atoms. We produce several counterexamples to this belief, with the consequence that any classifier, regression or embedding model for atom-centred properties that uses 3 (or 4)-body features will incorrectly give identical results for different configurations. Writing global properties (such as total energies) as a sum of many atom-centred contributions mitigates, but does not eliminate, the impact of this fundamental deficiency -- explaining the success of current machine-learning force fields. We anticipate the issues that will arise as the desired accuracy increases, and suggest potential solutions.
The nonlocal correlation energy in the van der Waals density functional (vdW-DF) method [Phys. Rev. Lett. 92, 246401 (2004); Phys. Rev. B 76, 125112 (2007); Phys. Rev. B 89, 035412 (2014)] can be interpreted in terms of a coupling of zero-point energies of characteristic modes of semilocal exchange-correlation (xc) holes. These xc holes reflect the internal functional in the framework of the vdW-DF method [Phys. Rev. B 82, 081101(2010)]. We explore the internal xc hole components, showing that they share properties with those of the generalized-gradient approximation. We use these results to illustrate the nonlocality in the vdW-DF description and analyze the vdW-DF formulation of nonlocal correlation.
The semilocal meta generalized gradient approximation (MGGA) for the exchange-correlation functional of Kohn-Sham (KS) density functional theory can yield accurate ground-state energies simultaneously for atoms, molecules, surfaces, and solids, due to the inclusion of kinetic energy density as an input. We study for the first time the effect and importance of the dependence of MGGA on the kinetic energy density through the dimensionless inhomogeneity parameter, $alpha$, that characterizes the extent of orbital overlap. This leads to a simple and wholly new MGGA exchange functional, which interpolates between the single-orbital regime, where $alpha=0$, and the slowly varying density regime, where $alpha approx 1$, and then extrapolates to $alpha to infty$. When combined with a variant of the Perdew-Burke-Erzerhof (PBE) GGA correlation, the resulting MGGA performs equally well for atoms, molecules, surfaces, and solids.