New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer

التدريب المتكامل لنماذج تسلسل إلى تسلسل باستخدام محول غير تلقائي

323 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

non-autoregressive transformer محول غير تلقائي تمرين صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

تطبيقات اللغة الطبيعية المعقدة مثل ترجمة الكلام أو الترجمة المحورية تعتمد تقليديا على النماذج المتتالية. ومع ذلك، من المعروف أن النماذج المتتالية عرضة لتوسيع الأخطاء ومشاكل التناقض النموذجي. علاوة على ذلك، لا توجد إمكانية لاستخدام بيانات التدريب المناسبة في النظم المتتالية التقليدية، مما يعني أن البيانات التدريبية الأكثر ملاءمة للمهمة لا يمكن استخدامها. اقترحت الدراسات الفقيرة عدة طرق تدريبية للتدريب المنتهي المتكاملة للتغلب عليها مشاكل، ومع ذلك، فإنهم يعتمدون في الغالب على بيانات ثلاثية الاتجاه (الاصطناعية أو الطبيعية). نقترح نموذجا متماثلا يعتمد على المحول غير التلقائي الذي يتيح التدريب المنتهي دون الحاجة إلى تمثيل واضح وسيط. تتجنب هذه الهندسة المعمارية الجديدة (I) القرارات المبكرة غير الضرورية التي يمكن أن تسبب أخطاء يتم نشرها بعد ذلك في جميع النماذج المتتالية (II) باستخدام بيانات التدريب المناسبة مباشرة. نحن نقوم بإجراء تقييم على مهام ترجمة من الآلة المحورية، وهي الفرنسية → الألمانية والألمانية → جمهورية التشيك. تظهر نتائجنا التجريبية أن الهندسة المعمارية المقترحة تعطي تحسنا أكثر من 2 بلو للفرنسية → الألمانية على خط الأساس المتتالي.

Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However,cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used.Previous studies suggested several approaches for integrated end-to-end training to overcome those problems, however they mostly rely on(synthetic or natural) three-way data. We propose a cascaded model based on the non-autoregressive Transformer that enables end-to-end training without the need for an explicit intermediate representation. This new architecture (i) avoids unnecessary early decisions that can cause errors which are then propagated throughout the cascaded models and (ii) utilizes the end-to-end training data directly. We conduct an evaluation on two pivot-based machine translation tasks, namely French→German and German→Czech. Our experimental results show that the proposed architecture yields an improvement of more than 2 BLEU for French→German over the cascaded baseline.

References used

https://aclanthology.org/

rate research

Sequence-to-Lattice Models for Fast Translation

369 - Association for Computation Linguistics 2021 مقالة

Non-autoregressive machine translation (NAT) approaches enable fast generation by utilizing parallelizable generative processes. The remaining bottleneck in these models is their decoder layers; unfortunately unlike in autoregressive models (Kasai et al., 2020), removing decoder layers from NAT models significantly degrades accuracy. This work proposes a sequence-to-lattice model that replaces the decoder with a search lattice. Our approach first constructs a candidate lattice using efficient lookup operations, generates lattice scores from a deep encoder, and finally finds the best path using dynamic programming. Experiments on three machine translation datasets show that our method is faster than past non-autoregressive generation approaches, and more accurate than naively reducing the number of decoder layers.

fast translation ترجمة سريعة صناعة حمض الفوسفور

Sequence-to-Sequence Lexical Normalization with Multilingual Transformers

359 - Association for Computation Linguistics 2021 مقالة

Current benchmark tasks for natural language processing contain text that is qualitatively different from the text used in informal day to day digital communication. This discrepancy has led to severe performance degradation of state-of-the-art NLP m odels when fine-tuned on real-world data. One way to resolve this issue is through lexical normalization, which is the process of transforming non-standard text, usually from social media, into a more standardized form. In this work, we propose a sentence-level sequence-to-sequence model based on mBART, which frames the problem as a machine translation problem. As the noisy text is a pervasive problem across languages, not just English, we leverage the multi-lingual pre-training of mBART to fine-tune it to our data. While current approaches mainly operate at the word or subword level, we argue that this approach is straightforward from a technical standpoint and builds upon existing pre-trained transformer networks. Our results show that while word-level, intrinsic, performance evaluation is behind other methods, our model improves performance on extrinsic, downstream tasks through normalization compared to models operating on raw, unprocessed, social media text.

multilingual transformers multilingual محولات متعددة اللغات متعدد اللغات صناعة حمض الفوسفور

Structure-aware Fine-tuning of Sequence-to-sequence Transformers for Transition-based AMR Parsing

402 - Association for Computation Linguistics 2021 مقالة

Predicting linearized Abstract Meaning Representation (AMR) graphs using pre-trained sequence-to-sequence Transformer models has recently led to large improvements on AMR parsing benchmarks. These parsers are simple and avoid explicit modeling of str ucture but lack desirable properties such as graph well-formedness guarantees or built-in graph-sentence alignments. In this work we explore the integration of general pre-trained sequence-to-sequence language models and a structure-aware transition-based approach. We depart from a pointer-based transition system and propose a simplified transition set, designed to better exploit pre-trained language models for structured fine-tuning. We also explore modeling the parser state within the pre-trained encoder-decoder architecture and different vocabulary strategies for the same purpose. We provide a detailed comparison with recent progress in AMR parsing and show that the proposed parser retains the desirable properties of previous transition-based approaches, while being simpler and reaching the new parsing state of the art for AMR 2.0, without the need for graph re-categorization.

اللازمة للتحسين transition-based amr parsing تحليل عمرو القائمة على الانتقال صناعة حمض الفوسفور

Extend, don't rebuild: Phrasing conditional graph modification as autoregressive sequence labelling

401 - Association for Computation Linguistics 2021 مقالة

Deriving and modifying graphs from natural language text has become a versatile basis technology for information extraction with applications in many subfields, such as semantic parsing or knowledge graph construction. A recent work used this techniq ue for modifying scene graphs (He et al. 2020), by first encoding the original graph and then generating the modified one based on this encoding. In this work, we show that we can considerably increase performance on this problem by phrasing it as graph extension instead of graph generation. We propose the first model for the resulting graph extension problem based on autoregressive sequence labelling. On three scene graph modification data sets, this formulation leads to improvements in accuracy over the state-of-the-art between 13 and 24 percentage points. Furthermore, we introduce a novel data set from the biomedical domain which has much larger linguistic variability and more complex graphs than the scene graph modification data sets. For this data set, the state-of-the art fails to generalize, while our model can produce meaningful predictions.

phrasing conditional graph conditional graph modification الصياغة الرسمية الرسم البياني رسم بياني تعديل الرسم البياني الشرطي صناعة حمض الفوسفور

Zero-shot Sequence Labeling for Transformer-based Sentence Classifiers

303 - Association for Computation Linguistics 2021 مقالة

We investigate how sentence-level transformers can be modified into effective sequence labelers at the token level without any direct supervision. Existing approaches to zero-shot sequence labeling do not perform well when applied on transformer-base d architectures. As transformers contain multiple layers of multi-head self-attention, information in the sentence gets distributed between many tokens, negatively affecting zero-shot token-level performance. We find that a soft attention module which explicitly encourages sharpness of attention weights can significantly outperform existing methods.

transformer-based sentence classifiers sentence classifiers zero-shot sequence labeling منصوص السلبية القائمة على المحولات منصوص السجن صفر تسلسل تسلسل صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer

التدريب المتكامل لنماذج تسلسل إلى تسلسل باستخدام محول غير تلقائي

Ask ChatGPT about the research

Read More

suggested questions