New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs

تحسين تنوع إعادة الصياغة غير المنشأة مع مخرجات التضمين

286 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

unsupervised paraphrasing diversity of unsupervised unsupervised إعادة صياغة غير منشأة تنوع غير مؤظفي غير مهتم صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present a novel technique for zero-shot paraphrase generation. The key contribution is an end-to-end multilingual paraphrasing model that is trained using translated parallel corpora to generate paraphrases into meaning spaces'' -- replacing the final softmax layer with word embeddings. This architectural modification, plus a training procedure that incorporates an autoencoding objective, enables effective parameter sharing across languages for more fluent monolingual rewriting, and facilitates fluency and diversity in the generated outputs. Our continuous-output paraphrase generation models outperform zero-shot paraphrasing baselines when evaluated on two languages using a battery of computational metrics as well as in human assessment.

References used

https://aclanthology.org/

rate research

Syntactically-Informed Unsupervised Paraphrasing with Non-Parallel Data

332 - Association for Computation Linguistics 2021 مقالة

Previous works on syntactically controlled paraphrase generation heavily rely on large-scale parallel paraphrase data that is not easily available for many languages and domains. In this paper, we take this research direction to the extreme and inves tigate whether it is possible to learn syntactically controlled paraphrase generation with nonparallel data. We propose a syntactically-informed unsupervised paraphrasing model based on conditional variational auto-encoder (VAE) which can generate texts in a specified syntactic structure. Particularly, we design a two-stage learning method to effectively train the model using non-parallel data. The conditional VAE is trained to reconstruct the input sentence according to the given input and its syntactic structure. Furthermore, to improve the syntactic controllability and semantic consistency of the pre-trained conditional VAE, we fine-tune it using syntax controlling and cycle reconstruction learning objectives, and employ Gumbel-Softmax to combine these new learning objectives. Experiment results demonstrate that the proposed model trained only on non-parallel data is capable of generating diverse paraphrases with specified syntactic structure. Additionally, we validate the effectiveness of our method for generating syntactically adversarial examples on the sentiment analysis task.

syntactically-informed unsupervised paraphrasing controlled paraphrase generation إعادة صياغة غير مخالفة من غير المستنيرة توليد الصياغة التي تسيطر عليها صناعة حمض الفوسفور

Unsupervised Paraphrasing with Pretrained Language Models

406 - Association for Computation Linguistics 2021 مقالة

Paraphrase generation has benefited extensively from recent progress in the designing of training objectives and model architectures. However, previous explorations have largely focused on supervised methods, which require a large amount of labeled d ata that is costly to collect. To address this drawback, we adopt a transfer learning approach and propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking (DB). To enforce a surface form dissimilar from the input, whenever the language model emits a token contained in the source sequence, DB prevents the model from outputting the subsequent source token for the next generation step. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair (QQP) and the ParaNMT datasets and is robust to domain shift between the two datasets of distinct distributions. We also demonstrate that our model transfers to paraphrasing in other languages without any additional finetuning.

مكافأة تقليد مختلفة paraphrasing with pretrained إعادة صياغة مع الاحاد صناعة حمض الفوسفور

Controllable Text Simplification with Explicit Paraphrasing

648 - Association for Computation Linguistics 2021 مقالة

Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting. Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously. However, such systems limit themselves to mostly deleting words and cannot easily adapt to the requirements of different target audiences. In this paper, we propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles. We introduce a new data augmentation method to improve the paraphrasing capability of our model. Through automatic and manual evaluations, we show that our proposed model establishes a new state-of-the-art for the task, paraphrasing more often than the existing systems, and can control the degree of each simplification operation applied to the input texts.

controllable text simplification explicit paraphrasing تبسيط النص الذي يمكن السيطرة عليه إعادة صياغة صريحة صناعة حمض الفوسفور

Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition

301 - Association for Computation Linguistics 2021 مقالة

Unsupervised consistency training is a way of semi-supervised learning that encourages consistency in model predictions between the original and augmented data. For Named Entity Recognition (NER), existing approaches augment the input sequence with t oken replacement, assuming annotations on the replaced positions unchanged. In this paper, we explore the use of paraphrasing as a more principled data augmentation scheme for NER unsupervised consistency training. Specifically, we convert Conditional Random Field (CRF) into a multi-label classification module and encourage consistency on the entity appearance between the original and paraphrased sequences. Experiments show that our method is especially effective when annotations are limited.

low resource named resource named entity الموارد المنخفضة اسمه الكيان المسمى الموارد صناعة حمض الفوسفور

Improving Unsupervised Dialogue Topic Segmentation with Utterance-Pair Coherence Scoring

306 - Association for Computation Linguistics 2021 مقالة

Dialogue topic segmentation is critical in several dialogue modeling problems. However, popular unsupervised approaches only exploit surface features in assessing topical coherence among utterances. In this work, we address this limitation by leverag ing supervisory signals from the utterance-pair coherence scoring task. First, we present a simple yet effective strategy to generate a training corpus for utterance-pair coherence scoring. Then, we train a BERT-based neural utterance-pair coherence model with the obtained training corpus. Finally, such model is used to measure the topical relevance between utterances, acting as the basis of the segmentation inference. Experiments on three public datasets in English and Chinese demonstrate that our proposal outperforms the state-of-the-art baselines.

dialogue topic segmentation unsupervised dialogue topic improving unsupervised dialogue تجزئة موضوع الحوار موضوع الحوار غير المزعوم تحسين الحوار غير المنشور صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs

تحسين تنوع إعادة الصياغة غير المنشأة مع مخرجات التضمين

Ask ChatGPT about the research

Read More

suggested questions