No Arabic abstract
The requirement for accelerated and quantitatively accurate screening of nuclear magnetic resonance spectra across the small molecules chemical compound space is two-fold: (1) a robust `local machine learning (ML) strategy capturing the effect of neighbourhood on an atoms `near-sighted property -- chemical shielding; (2) an accurate reference dataset generated with a state-of-the-art first principles method for training. Herein we report the QM9-NMR dataset comprising isotropic shielding of over 0.8 million C atoms in 134k molecules of the QM9 dataset in gas and five common solvent phases. Using these data for training, we present benchmark results for the prediction transferability of kernel-ridge regression models with popular local descriptors. Our best model trained on 100k samples, accurately predict isotropic shielding of 50k `hold-out atoms with a mean error of less than $1.9$ ppm. For rapid prediction of new query molecules, the models were trained on geometries from an inexpensive theory. Furthermore, by using a $Delta$-ML strategy, we quench the error below $1.4$ ppm. Finally, we test the transferability on non-trivial benchmark sets that include benchmark molecules comprising 10 to 17 heavy atoms and drugs.
A key challenge in automated chemical compound space explorations is ensuring veracity in minimum energy geometries---to preserve intended bonding connectivities. We discuss an iterative high-throughput workflow for connectivity preserving geometry optimizations exploiting the nearness between quantum mechanical models. The methodology is benchmarked on the QM9 dataset comprising DFT-level properties of 133,885 small molecules; of which 3,054 have questionable geometric stability. We successfully troubleshoot 2,988 molecules and ensure a bijective mapping between desired Lewis formulae and final geometries. Our workflow, based on DFT and post-DFT methods, identifies 66 molecules as unstable; 52 contain $-{rm NNO}-$, the rest are strained due to pyramidal sp$^2$ C. In the curated dataset, we inspect molecules with long CC bonds and identify ultralong contestants ($r>1.70$~AA{}) supported by topological analysis of electron density. We hope the proposed strategy to play a role in big data quantum chemistry initiatives.
This article reviews recent developments in tests of fundamental physics using atoms and molecules, including the subjects of parity violation, searches for permanent electric dipole moments, tests of the CPT theorem and Lorentz symmetry, searches for spatiotemporal variation of fundamental constants, tests of quantum electrodynamics, tests of general relativity and the equivalence principle, searches for dark matter, dark energy and extra forces, and tests of the spin-statistics theorem. Key results are presented in the context of potential new physics and in the broader context of similar investigations in other fields. Ongoing and future experiments of the next decade are discussed.
We address the degree to which machine learning can be used to accurately and transferably predict post-Hartree-Fock correlation energies. Refined strategies for feature design and selection are presented, and the molecular-orbital-based machine learning (MOB-ML) method is applied to several test systems. Strikingly, for the MP2, CCSD, and CCSD(T) levels of theory, it is shown that the thermally accessible (350 K) potential energy surface for a single water molecule can be described to within 1 millihartree using a model that is trained from only a single reference calculation at a randomized geometry. To explore the breadth of chemical diversity that can be described, MOB-ML is also applied to a new dataset of thermalized (350 K) geometries of 7211 organic models with up to seven heavy atoms. In comparison with the previously reported $Delta$-ML method, MOB-ML is shown to reach chemical accuracy with three-fold fewer training geometries. Finally, a transferability test in which models trained for seven-heavy-atom systems are used to predict energies for thirteen-heavy-atom systems reveals that MOB-ML reaches chemical accuracy with 36-fold fewer training calculations than $Delta$-ML (140 versus 5000 training calculations).
The IMPRESSION (Intelligent Machine PREdiction of Shift and Scalar Information Of Nuclei) machine learning system provides an efficient and accurate route to the prediction of NMR parameters from 3-dimensional chemical structures. Here we demonstrate that machine learning predictions, trained on quantum chemical computed values for NMR parameters, are essentially as accurate but computationally much more efficient (tens of milliseconds per molecule) than quantum chemical calculations (hours/days per molecule). Training the machine learning systems on quantum chemical, rather than experimental, data circumvents the need for existence of large, structurally diverse, error-free experimental databases and makes IMPRESSION applicable to solving 3-dimensional problems such as molecular conformation and isomerism
Dynamics of flexible molecules are often determined by an interplay between local chemical bond fluctuations and conformational changes driven by long-range electrostatics and van der Waals interactions. This interplay between interactions yields complex potential-energy surfaces (PES) with multiple minima and transition paths between them. In this work, we assess the performance of state-of-the-art Machine Learning (ML) models, namely sGDML, SchNet, GAP/SOAP, and BPNN for reproducing such PES, while using limited amounts of reference data. As a benchmark, we use the cis to trans thermal relaxation in an azobenzene molecule, where at least three different transition mechanisms should be considered. Although GAP/SOAP, SchNet, and sGDML models can globally achieve chemical accuracy of 1 kcal mol-1 with fewer than 1000 training points, predictions greatly depend on the ML method used as well as the local region of the PES being sampled. Within a given ML method, large differences can be found between predictions of close-to-equilibrium and transition regions, as well as for different transition mechanisms. We identify key challenges that the ML models face in learning long-range interactions and the intrinsic limitations of commonly used atom-based descriptors. All in all, our results suggest switching from learning the entire PES within a single model to using multiple local models with optimized descriptors, training sets, and architectures for different parts of complex PES.