أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Vijay S. Pande

Physical machine learning outperforms human learning in Quantum Chemistry

81 - Anton V. Sinitskiy , Vijay S. Pande 2019

Two types of approaches to modeling molecular systems have demonstrated high practical efficiency. Density functional theory (DFT), the most widely used quantum chemical method, is a physical approach predicting energies and electron densities of mol ecules. Recently, numerous papers on machine learning (ML) of molecular properties have also been published. ML models greatly outperform DFT in terms of computational costs, and may even reach comparable accuracy, but they are missing physicality - a direct link to Quantum Physics - which limits their applicability. Here, we propose an approach that combines the strong sides of DFT and ML, namely, physicality and low computational cost. By generalizing the famous Hohenberg-Kohn theorems, we derive general equations for exact electron densities and energies that can naturally guide applications of ML in Quantum Chemistry. Based on these equations, we build a deep neural network that can compute electron densities and energies of a wide range of organic molecules not only much faster, but also closer to exact physical values than curre

الفيزياء الكيميائية الفيزياء الحسابية

Predicting Gene Expression Between Species with Neural Networks

121 - Peter Eastman , Vijay S. Pande 2019

We train a neural network to predict human gene expression levels based on experimental data for rat cells. The network is trained with paired human/rat samples from the Open TG-GATES database, where paired samples were treated with the same compound at the same dose. When evaluated on a test set of held out compounds, the network successfully predicts human expression levels. On the majority of the test compounds, the list of differentially expressed genes determined from predicted expression levels agrees well with the list of differentially expressed genes determined from actual human experimental data.

الجينوم

Predicting Toxicity from Gene Expression with Neural Networks

156 - Peter Eastman , Vijay S. Pande 2019

We train a neural network to predict chemical toxicity based on gene expression data. The input to the network is a full expression profile collected either in vitro from cultured cells or in vivo from live animals. The output is a set of fine graine d predictions for the presence of a variety of pathological effects in treated animals. When trained on the Open TG-GATEs database it produces good results, outperforming classical models trained on the same data. This is a promising approach for efficiently screening chemicals for toxic effects, and for more accurately evaluating drug candidates based on preclinical data.

الجينوم

Deep Neural Network Computes Electron Densities and Energies of a Large Set of Organic Molecules Faster than Density Functional Theory (DFT)

328 - Anton V. Sinitskiy , Vijay S. Pande 2018

Density functional theory (DFT) is one of the main methods in Quantum Chemistry that offers an attractive trade off between the cost and accuracy of quantum chemical computations. The electron density plays a key role in DFT. In this work, we explore whether machine learning - more specifically, deep neural networks (DNNs) - can be trained to predict electron densities faster than DFT. First, we choose a practically efficient combination of a DFT functional and a basis set (PBE0/pcS-3) and use it to generate a database of DFT solutions for more than 133,000 organic molecules from a previously published database QM9. Next, we train a DNN to predict electron densities and energies of such molecules. The only input to the DNN is an approximate electron density computed with a cheap quantum chemical method in a small basis set (HF/cc-VDZ). We demonstrate that the DNN successfully learns differences in the electron densities arising both from electron correlation and small basis set artifacts in the HF computations. All qualitative features in density differences, including local minima on lone pairs, local maxima on nuclei, toroidal shapes around C-H and C-C bonds, complex shapes around aromatic and cyclopropane rings and CN group, etc. are captured by the DNN. Accuracy of energy predictions by the DNN is ~ 1 kcal/mol, on par with other models reported in the literature, while those models do not predict the electron density. Computations with the DNN, including HF computations, take much less time that DFT computations (by a factor of ~20-30 for most QM9 molecules in the current version, and it is clear how it could be further improved).

الفيزياء الكيميائية الفيزياء الحسابية

Binding Pathway of Opiates to $mu$ Opioid Receptors Revealed by Unsupervised Machine Learning

85 - Amir Barati Farimani , Evan N. Feinberg , Vijay S. Pande 2018

Many important analgesics relieve pain by binding to the $mu$-Opioid Receptor ($mu$OR), which makes the $mu$OR among the most clinically relevant proteins of the G Protein Coupled Receptor (GPCR) family. Despite previous studies on the activation pat hways of the GPCRs, the mechanism of opiate binding and the selectivity of $mu$OR are largely unknown. We performed extensive molecular dynamics (MD) simulation and analysis to find the selective allosteric binding sites of the $mu$OR and the path opiates take to bind to the orthosteric site. In this study, we predicted that the allosteric site is responsible for the attraction and selection of opiates. Using Markov state models and machine learning, we traced the pathway of opiates in binding to the orthosteric site, the main binding pocket. Our results have important implications in designing novel analgesics.

الجزيئات الحيوية الأساليب الكمية

Note: Variational Encoding of Protein Dynamics Benefits from Maximizing Latent Autocorrelation

36 - Hannah K. Wayment-Steele , Vijay S. Pande 2018

As deep Variational Auto-Encoder (VAE) frameworks become more widely used for modeling biomolecular simulation data, we emphasize the capability of the VAE architecture to concurrently maximize the timescale of the latent space while inferring a redu ced coordinate, which assists in finding slow processes as according to the variational approach to conformational dynamics. We additionally provide evidence that the VDE framework (Hernandez et al., 2017), which uses this autocorrelation loss along with a time-lagged reconstruction loss, obtains a variationally optimized latent coordinate in comparison with related loss functions. We thus recommend leveraging the autocorrelation of the latent space while training neural network models of biomolecular simulation data to better represent slow processes.

الفيزياء الكيميائية التعلم الآلي الفيزياء البيولوجية

Unsupervised learning of dynamical and molecular similarity using variance minimization

84 - Brooke E. Husic , Vijay S. Pande 2017

In this report, we present an unsupervised machine learning method for determining groups of molecular systems according to similarity in their dynamics or structures using Wards minimum variance objective function. We first apply the minimum varianc e clustering to a set of simulated tripeptides using the information theoretic Jensen-Shannon divergence between Markovian transition matrices in order to gain insight into how point mutations affect protein dynamics. Then, we extend the method to partition two chemoinformatic datasets according to structural similarity to motivate a train/validation/test split for supervised learning that avoids overfitting.

الفيزياء البيولوجية الجزيئات الحيوية الأساليب الكمية

MSM lag time cannot be used for variational model selection

167 - Brooke E. Husic , Vijay S. Pande 2017

The variational principle for conformational dynamics has enabled the systematic construction of Markov state models through the optimization of hyperparameters by approximating the transfer operator. In this note we discuss why lag time of the opera tor being approximated must be held constant in the variational approach.

الجزيئات الحيوية الفيزياء البيولوجية الفيزياء الكيميائية

Theoretical restrictions on longest implicit timescales in Markov state models of biomolecular dynamics

129 - Anton V. Sinitskiy , Vijay S. Pande 2017

Markov state models (MSMs) have been widely used to analyze computer simulations of various biomolecular systems. They can capture conformational transitions much slower than an average or maximal length of a single molecular dynamics (MD) trajectory from the set of trajectories used to build the MSM. A rule of thumb claiming that the slowest implicit timescale captured by an MSM should be comparable by the order of magnitude to the aggregate duration of all MD trajectories used to build this MSM has been known in the field. However, this rule have never been formally proved. In this work, we present analytical results for the slowest timescale in several types of MSMs, supporting the above rule. We conclude that the slowest implicit timescale equals the product of the aggregate sampling and four factors that quantify: (1) how much statistics on the conformational transitions corresponding to the longest implicit timescale is available, (2) how good the sampling of the destination Markov state is, (3) the gain in statistics from using a sliding window for counting transitions between Markov states, and (4) a bias in the estimate of the implicit timescale arising from finite sampling of the conformational transitions. We demonstrate that in many practically important cases all these four factors are on the order of unity, and we analyze possible scenarios that could lead to their significant deviation from unity. Overall, we provide for the first time analytical results on the slowest timescales captured by MSMs. These results can guide further practical applications of MSMs to biomolecular dynamics and allow for higher computational efficiency of simulations.

الجزيئات الحيوية الفيزياء البيولوجية الفيزياء الحسابية

Identification of simple reaction coordinates from complex dynamics

309 - Robert T. McGibbon , Brooke E. Husic , Vijay S. Pande 2016

Reaction coordinates are widely used throughout chemical physics to model and understand complex chemical transformations. We introduce a definition of the natural reaction coordinate, suitable for condensed phase and biomolecular systems, as a maxim ally predictive one-dimensional projection. We then show this criterion is uniquely satisfied by a dominant eigenfunction of an integral operator associated with the ensemble dynamics. We present a new sparse estimator for these eigenfunctions which can search through a large candidate pool of structural order parameters and build simple, interpretable approximations that employ only a small number of these order parameters. Example applications with a small molecules rotational dynamics and simulations of protein conformational change and folding show that this approach can filter through statistical noise to identify simple reaction coordinates from complex dynamics.

الميكانيكا الإحصائية الجزيئات الحيوية الأساليب الكمية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد