بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Towards Universality in Multilingual Text Rewriting

114 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xavier Garcia

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xavier Garcia - Noah Constant - Mandy Guo

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this work, we take the first steps towards building a universal rewriter: a model capable of rewriting text in any language to exhibit a wide variety of attributes, including styles and languages, while preserving as much of the original semantics as possible. In addition to obtaining state-of-the-art results on unsupervised translation, we also demonstrate the ability to do zero-shot sentiment transfer in non-English languages using only English exemplars for sentiment. We then show that our model is able to modify multiple attributes at once, for example adjusting both language and sentiment jointly. Finally, we show that our model is capable of performing zero-shot formality-sensitive translation.

قيم البحث

105 - Lei Xu , Kalyan Veeramachaneni 2021

Most adversarial attack methods on text classification can change the classifiers prediction by synonym substitution. We propose the adversarial sentence rewriting sampler (ASRS), which rewrites the whole sentence to generate more similar and higher- quality adversarial examples. Our method achieves a better attack success rate on 4 out of 7 datasets, as well as significantly better sentence quality on all 7 datasets. ASRS is an indispensable supplement to the existing attack methods, because classifiers cannot resist the attack from ASRS unless they are trained on adversarial examples found by ASRS.

الحساب واللغة

mT5: A massively multilingual pre-trained text-to-text transformer

189 - Linting Xue , Noah Constant , Adam Roberts 2020

The recent Text-to-Text Transfer Transformer (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent accidental translation in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

الحساب واللغة

ALTER: Auxiliary Text Rewriting Tool for Natural Language Generation

101 - Qiongkai Xu , Chenchen Xu , Lizhen Qu 2019

In this paper, we describe ALTER, an auxiliary text rewriting tool that facilitates the rewriting process for natural language generation tasks, such as paraphrasing, text simplification, fairness-aware text rewriting, and text style transfer. Our to ol is characterized by two features, i) recording of word-level revision histories and ii) flexible auxiliary edit support and feedback to annotators. The text rewriting assist and traceable rewriting history are potentially beneficial to the future research of natural language generation.

الحساب واللغة

Learning Multilingual Word Embeddings Using Image-Text Data

87 - Karan Singhal , Karthik Raman , Balder ten Cate 2019

There has been significant interest recently in learning multilingual word embeddings -- in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavail able for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.

الحساب واللغة الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

MTG: A Benchmarking Suite for Multilingual Text Generation

93 - Yiran Chen , Zhenqiao Song , Xianze Wu 2021

We introduce MTG, a new benchmark suite for training and evaluating multilingual text generation. It is the first and largest text generation benchmark with 120k human-annotated multi-way parallel data for three tasks (story generation, question gene ration, and title generation) across four languages (English, German, French, and Spanish). Based on it, we set various evaluation scenarios and make a deep analysis of several popular multilingual generation models from different aspects. Our benchmark suite will encourage the multilingualism for text generation community with more human-annotated parallel data and more diverse generation scenarios.

الحساب واللغة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة اليرموك الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Towards Universality in Multilingual Text Rewriting

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً