Non-Autoregressive Electron Redistribution Modeling for Reaction Prediction

92 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Hangrui Bi

تاريخ النشر 2021

مجال البحث فيزياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Hangrui Bi - Hengyi Wang - Chence Shi

الفيزياء الكيميائية الهندسة الحاسوبية، المالية،العلوم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Reliably predicting the products of chemical reactions presents a fundamental challenge in synthetic chemistry. Existing machine learning approaches typically produce a reaction product by sequentially forming its subparts or intermediate molecules. Such autoregressive methods, however, not only require a pre-defined order for the incremental construction but preclude the use of parallel decoding for efficient computation. To address these issues, we devise a non-autoregressive learning paradigm that predicts reaction in one shot. Leveraging the fact that chemical reactions can be described as a redistribution of electrons in molecules, we formulate a reaction as an arbitrary electron flow and predict it with a novel multi-pointer decoding network. Experiments on the USPTO-MIT dataset show that our approach has established a new state-of-the-art top-1 accuracy and achieves at least 27 times inference speedup over the state-of-the-art methods. Also, our predictions are easier for chemists to interpret owing to predicting the electron flows.

قيم البحث

113 - Bin Li , Jian Tian , Zhongfei Zhang 2020

Human motion prediction, which aims at predicting future human skeletons given the past ones, is a typical sequence-to-sequence problem. Therefore, extensive efforts have been continued on exploring different RNN-based encoder-decoder architectures. However, by generating target poses conditioned on the previously generated ones, these models are prone to bringing issues such as error accumulation problem. In this paper, we argue that such issue is mainly caused by adopting autoregressive manner. Hence, a novel Non-auToregressive Model (NAT) is proposed with a complete non-autoregressive decoding scheme, as well as a context encoder and a positional encoding module. More specifically, the context encoder embeds the given poses from temporal and spatial perspectives. The frame decoder is responsible for predicting each future pose independently. The positional encoding module injects positional signal into the model to indicate temporal order. Moreover, a multitask training paradigm is presented for both low-level human skeleton prediction and high-level human action recognition, resulting in the convincing improvement for the prediction task. Our approach is evaluated on Human3.6M and CMU-Mocap benchmarks and outperforms state-of-the-art autoregressive methods.

الرؤية الحاسوبية وتمييز الأنماط

Take a NAP: Non-Autoregressive Prediction for Pedestrian Trajectories

102 - Hao Xue , Du. Q. Huynh , Mark Reynolds 2020

Pedestrian trajectory prediction is a challenging task as there are three properties of human movement behaviors which need to be addressed, namely, the social influence from other pedestrians, the scene constraints, and the multimodal (multiroute) n ature of predictions. Although existing methods have explored these key properties, the prediction process of these methods is autoregressive. This means they can only predict future locations sequentially. In this paper, we present NAP, a non-autoregressive method for trajectory prediction. Our method comprises specifically designed feature encoders and a latent variable generator to handle the three properties above. It also has a time-agnostic context generator and a time-specific context generator for non-autoregressive prediction. Through extensive experiments that compare NAP against several recent methods, we show that NAP has state-of-the-art trajectory prediction performance.

الرؤية الحاسوبية وتمييز الأنماط

New universal Lyapunov functions for non-linear reaction networks

128 - A.N. Gorban 2019

In 1961, Renyi discovered a rich family of non-classical Lyapunov functions for kinetics of the Markov chains, or, what is the same, for the linear kinetic equations. This family was parameterised by convex functions on the positive semi-axis. After works of Csiszar and Morimoto, these functions became widely known as $f$-divergences or the Csiszar--Morimoto divergences. These Lyapunov functions are universal in the following sense: they depend only on the state of equilibrium, not on the kinetic parameters themselves. Despite many years of research, no such wide family of universal Lyapunov functions has been found for nonlinear reaction networks. For general non-linear networks with detailed or complex balance, the classical thermodynamics potentials remain the only universal Lyapunov functions. We constructed a rich family of new universal Lyapunov functions for {em any non-linear reaction network} with detailed or complex balance. These functions are parameterised by compact subsets of the projective space. They are universal in the same sense: they depend only on the state of equilibrium and on the network structure, but not on the kinetic parameters themselves. The main elements and operations in the construction of the new Lyapunov functions are partial equilibria of reactions and convex envelopes of families of functions.

الفيزياء الكيميائية الفيزياء الرياضية الفيزياء الرياضية

Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction

97 - Piji Li , Shuming Shi 2021

We investigate the problem of Chinese Grammatical Error Correction (CGEC) and present a new framework named Tail-to-Tail (textbf{TtT}) non-autoregressive sequence prediction to address the deep issues hidden in CGEC. Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected based on the bidirectional context information, thus we employ a BERT-initialized Transformer Encoder as the backbone model to conduct information modeling and conveying. Considering that only relying on the same position substitution cannot handle the variable-length correction cases, various operations such substitution, deletion, insertion, and local paraphrasing are required jointly. Therefore, a Conditional Random Fields (CRF) layer is stacked on the up tail to conduct non-autoregressive sequence prediction by modeling the token dependencies. Since most tokens are correct and easily to be predicted/conveyed to the target, then the models may suffer from a severe class imbalance issue. To alleviate this problem, focal loss penalty strategies are integrated into the loss functions. Moreover, besides the typical fix-length error correction datasets, we also construct a variable-length corpus to conduct experiments. Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure on tasks of error Detection and Correction.

الحساب واللغة الذكاء الاصطناعي

Scaling Laws for Autoregressive Generative Modeling

352 - Tom Henighan , Jared Kaplan , Mor Katz 2020

We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depends on the compute budget through a power-law, with exponents that are nearly universal across all data domains. The cross-entropy loss has an information theoretic interpretation as $S($True$) + D_{mathrm{KL}}($True$||$Model$)$, and the empirical scaling laws suggest a prediction for both the true data distributions entropy and the KL divergence between the true and model distributions. With this interpretation, billion-parameter Transformers are nearly perfect models of the YFCC100M image distribution downsampled to an $8times 8$ resolution, and we can forecast the model size needed to achieve any given reducible loss (ie $D_{mathrm{KL}}$) in nats/image for other resolutions. We find a number of additional scaling laws in specific domains: (a) we identify a scaling relation for the mutual information between captions and images in multimodal models, and show how to answer the question Is a picture worth a thousand words?; (b) in the case of mathematical problem solving, we identify scaling laws for model performance when extrapolating beyond the training distribution; (c) we finetune generative image models for ImageNet classification and find smooth scaling of the classification loss and error rate, even as the generative loss levels off. Taken together, these results strengthen the case that scaling laws have important implications for neural network performance, including on downstream tasks.

التعلم الآلي الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط