Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

الترجمة UGC صاخبة على مستوى الطابع: إعادة النظر في قدرات المفردات المفتوحة وأغاني النماذج المستندة إلى Char

512 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

revisiting open-vocabulary capabilities revisiting open-vocabulary open-vocabulary capabilities إعادة النظر في قدرات المفردات المفتوحة إعادة النظر في المفردات المفتوحة قدرات المفردات المفتوحة صناعة حمض الفوسفور

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This work explores the capacities of character-based Neural Machine Translation to translate noisy User-Generated Content (UGC) with a strong focus on exploring the limits of such approaches to handle productive UGC phenomena, which almost by definition, cannot be seen at training time. Within a strict zero-shot scenario, we first study the detrimental impact on translation performance of various user-generated content phenomena on a small annotated dataset we developed and then show that such models are indeed incapable of handling unknown letters, which leads to catastrophic translation failure once such characters are encountered. We further confirm this behavior with a simple, yet insightful, copy task experiment and highlight the importance of reducing the vocabulary size hyper-parameter to increase the robustness of character-based models for machine translation.

References used

https://aclanthology.org/

rate research

Revisiting Pivot-Based Paraphrase Generation: Language Is Not the Only Optional Pivot

351 - Association for Computation Linguistics 2021 مقالة

Paraphrases refer to texts that convey the same meaning with different expression forms. Pivot-based methods, also known as the round-trip translation, have shown promising results in generating high-quality paraphrases. However, existing pivot-based methods all rely on language as the pivot, where large-scale, high-quality parallel bilingual texts are required. In this paper, we explore the feasibility of using semantic and syntactic representations as the pivot for paraphrase generation. Concretely, we transform a sentence into a variety of different semantic or syntactic representations (including AMR, UD, and latent semantic representation), and then decode the sentence back from the semantic representations. We further explore a pretraining-based approach to compress the pipeline process into an end-to-end framework. We conduct experiments comparing different approaches with different kinds of pivots. Experimental results show that taking AMR as pivot can obtain paraphrases with better quality than taking language as the pivot. The end-to-end framework can reduce semantic shift when language is used as the pivot. Besides, several unsupervised pivot-based methods can generate paraphrases with similar quality as the supervised sequence-to-sequence model, which indicates that parallel data of paraphrases may not be necessary for paraphrase generation.

optional pivot revisiting pivot-based paraphrase محور اختياري إعادة النظر في إعادة صياغة القائم على المحور صناعة حمض الفوسفور

Understanding the Impact of UGC Specificities on Translation Quality

330 - Association for Computation Linguistics 2021 مقالة

This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT. Our analyses show that measuring the average-case performance using a standard metr ic on a UGC test set falls far short of giving a reliable image of the UGC translation quality. That is why we introduce a new data set for the evaluation of UGC translation in which UGC specificities have been manually annotated using a fine-grained typology. Using this data set, we conduct several experiments to measure the impact of different kinds of UGC specificities on translation quality, more precisely than previously possible.

ugc translation quality translation quality ugc specificities جودة الترجمة UGC جودة الترجمة خصوصيات UGC. صناعة حمض الفوسفور المزيد..

Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation

307 - Association for Computation Linguistics 2021 مقالة

Policy gradient algorithms have found wide adoption in NLP, but have recently become subject to criticism, doubting their suitability for NMT. Choshen et al. (2020) identify multiple weaknesses and suspect that their success is determined by the shap e of output distributions rather than the reward. In this paper, we revisit these claims and study them under a wider range of configurations. Our experiments on in-domain and cross-domain adaptation reveal the importance of exploration and reward scaling, and provide empirical counter-evidence to these claims.

ترميم الذاكرة المطلوبة تعزيز التعلم صناعة حمض الفوسفور

Revisiting Multi-Domain Machine Translation

306 - Association for Computation Linguistics 2021 مقالة

When building machine translation systems, one often needs to make the best out of heterogeneous sets of parallel data in training, and to robustly handle inputs from unexpected domains in testing. This multi-domain scenario has attracted a lot of re cent work that fall under the general umbrella of transfer learning. In this study, we revisit multi-domain machine translation, with the aim to formulate the motivations for developing such systems and the associated expectations with respect to performance. Our experiments with a large sample of multi-domain systems show that most of these expectations are hardly met and suggest that further work is needed to better analyze the current behaviour of multi-domain systems and to make them fully hold their promises.

multi-domain machine translation revisiting multi-domain machine ترجمة متعددة المجالات إعادة النظر آلة متعددة المجالات صناعة حمض الفوسفور

Revisiting Simple Neural Probabilistic Language Models

469 - Association for Computation Linguistics 2021 مقالة

Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements. In this paper, we revisit the neural probabilistic language model (NPLM) of Bengio et al. (200 3), which simply concatenates word embeddings within a fixed window and passes the result through a feed-forward network to predict the next word. When scaled up to modern hardware, this model (despite its many limitations) performs much better than expected on word-level language model benchmarks. Our analysis reveals that the NPLM achieves lower perplexity than a baseline Transformer with short input contexts but struggles to handle long-term dependencies. Inspired by this result, we modify the Transformer by replacing its first self-attention layer with the NPLM's local concatenation layer, which results in small but consistent perplexity decreases across three word-level language modeling datasets.

revisiting simple neural simple neural probabilistic neural probabilistic language إعادة النظر في العصبية البسيطة الاحتمال العصبي بسيط لغة الاحتمالية العصبية صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

الترجمة UGC صاخبة على مستوى الطابع: إعادة النظر في قدرات المفردات المفتوحة وأغاني النماذج المستندة إلى Char

Ask ChatGPT about the research

Read More

suggested questions