Energy-based View of Retrosynthesis

66 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ruoxi Sun

تاريخ النشر 2020

مجال البحث فيزياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Ruoxi Sun - Hanjun Dai - Li Li

الفيزياء الكيميائية التعلم الآلي الأساليب الكمية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Retrosynthesis -- the process of identifying a set of reactants to synthesize a target molecule -- is of vital importance to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. In this paper, we propose a framework that unifies sequence- and graph-based methods as energy-based models (EBMs) with different energy functions. This unified perspective provides critical insights about EBM variants through a comprehensive assessment of performance. Additionally, we present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction by constraining the agreement between the two directions. This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.

قيم البحث

217 - Xiangyan Sun , Ke Liu , Yuquan Lin 2021

We have developed an end-to-end, retrosynthesis system, named ChemiRise, that can propose complete retrosynthesis routes for organic compounds rapidly and reliably. The system was trained on a processed patent database of over 3 million organic react ions. Experimental reactions were atom-mapped, clustered, and extracted into reaction templates. We then trained a graph convolutional neural network-based one-step reaction proposer using template embeddings and developed a guiding algorithm on the directed acyclic graph (DAG) of chemical compounds to find the best candidate to explore. The atom-mapping algorithm and the one-step reaction proposer were benchmarked against previous studies and showed better results. The final product was demonstrated by retrosynthesis routes reviewed and rated by human experts, showing satisfying functionality and a potential productivity boost in real-life use cases.

الفيزياء الكيميائية التعلم الآلي الأساليب الكمية

Graph Energy-based Model for Substructure Preserving Molecular Design

95 - Ryuichiro Hataya , Hideki Nakayama , Kazuki Yoshizoe 2021

It is common practice for chemists to search chemical databases based on substructures of compounds for finding molecules with desired properties. The purpose of de novo molecular generation is to generate instead of search. Existing machine learning based molecular design methods have no or limited ability in generating novel molecules that preserves a target substructure. Our Graph Energy-based Model, or GEM, can fix substructures and generate the rest. The experimental results show that the GEMs trained from chemistry datasets successfully generate novel molecules while preserving the target substructures. This method would provide a new way of incorporating the domain knowledge of chemists in molecular design.

الفيزياء الكيميائية التعلم الآلي

A Bayesian algorithm for retrosynthesis

115 - Zhongliang Guo , Stephen Wu , Mitsuru Ohno 2020

The identification of synthetic routes that end with a desired product has been an inherently time-consuming process that is largely dependent on expert knowledge regarding a limited fraction of the entire reaction space. At present, emerging machine -learning technologies are overturning the process of retrosynthetic planning. The objective of this study is to discover synthetic routes backwardly from a given desired molecule to commercially available compounds. The problem is reduced to a combinatorial optimization task with the solution space subject to the combinatorial complexity of all possible pairs of purchasable reactants. We address this issue within the framework of Bayesian inference and computation. The workflow consists of two steps: a deep neural network is trained that forwardly predicts a product of the given reactants with a high level of accuracy, following which this forward model is inverted into the backward one via Bayes law of conditional probability. Using the backward model, a diverse set of highly probable reaction sequences ending with a given synthetic target is exhaustively explored using a Monte Carlo search algorithm. The Bayesian retrosynthesis algorithm could successfully rediscover 80.3% and 50.0% of known synthetic routes of single-step and two-step reactions within top-10 accuracy, respectively, thereby outperforming state-of-the-art algorithms in terms of the overall accuracy. Remarkably, the Monte Carlo method, which was specifically designed for the presence of diverse multiple routes, often revealed a ranked list of hundreds of reaction routes to the same synthetic target. We investigated the potential applicability of such diverse candidates based on expert knowledge from synthetic organic chemistry.

التعلم الالي التعلم الآلي الفيزياء الكيميائية

MGN-Net: a multi-view graph normalizer for integrating heterogeneous biological network populations

155 - Islem Rekik , Mustafa Burak Gurbuz 2021

With the recent technological advances, biological datasets, often represented by networks (i.e., graphs) of interacting entities, proliferate with unprecedented complexity and heterogeneity. Although modern network science opens new frontiers of ana lyzing connectivity patterns in such datasets, we still lack data-driven methods for extracting an integral connectional fingerprint of a multi-view graph population, let alone disentangling the typical from the atypical variations across the population samples. We present the multi-view graph normalizer network (MGN-Net; https://github.com/basiralab/MGN-Net), a graph neural network based method to normalize and integrate a set of multi-view biological networks into a single connectional template that is centered, representative, and topologically sound. We demonstrate the use of MGN-Net by discovering the connectional fingerprints of healthy and neurologically disordered brain network populations including Alzheimers disease and Autism spectrum disorder patients. Additionally, by comparing the learned templates of healthy and disordered populations, we show that MGN-Net significantly outperforms conventional network integration methods across extensive experiments in terms of producing the most centered templates, recapitulating unique traits of populations, and preserving the complex topology of biological networks. Our evaluations showed that MGN-Net is powerfully generic and easily adaptable in design to different graph-based problems such as identification of relevant connections, normalization and integration.

الشبكات الاجتماعية والمعلومات التعلم الآلي الأساليب الكمية

RetroXpert: Decompose Retrosynthesis Prediction like a Chemist

68 - Chaochao Yan , Qianggang Ding , Peilin Zhao 2020

Retrosynthesis is the process of recursively decomposing target molecules into available building blocks. It plays an important role in solving problems in organic synthesis planning. To automate or assist in the retrosynthesis analysis, various retr osynthesis prediction algorithms have been proposed. However, most of them are cumbersome and lack interpretability about their predictions. In this paper, we devise a novel template-free algorithm for automatic retrosynthetic expansion inspired by how chemists approach retrosynthesis prediction. Our method disassembles retrosynthesis into two steps: i) identify the potential reaction center of the target molecule through a novel graph neural network and generate intermediate synthons, and ii) generate the reactants associated with synthons via a robust reactant generation model. While outperforming the state-of-the-art baselines by a significant margin, our model also provides chemically reasonable interpretation.

الأساليب الكمية التعلم الآلي