Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Coping with Noisy Training Data Labels in Paraphrase Detection

التعامل مع تسميات بيانات التدريب الصاخبة في كشف إعادة صياغة

992 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

coping with noisy noisy training data training data labels التعامل مع صاخبة بيانات تدريب صاخبة تسميات بيانات التدريب صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present new state-of-the-art benchmarks for paraphrase detection on all six languages in the Opusparcus sentential paraphrase corpus: English, Finnish, French, German, Russian, and Swedish. We reach these baselines by fine-tuning BERT. The best results are achieved on smaller and cleaner subsets of the training sets than was observed in previous research. Additionally, we study a translation-based approach that is competitive for the languages with more limited and noisier training data.

References used

https://aclanthology.org/

rate research

Instance-adaptive training with noise-robust losses against noisy labels

793 - Association for Computation Linguistics 2021 مقالة

In order to alleviate the huge demand for annotated datasets for different tasks, many recent natural language processing datasets have adopted automated pipelines for fast-tracking usable data. However, model training with such datasets poses a chal lenge because popular optimization objectives are not robust to label noise induced in the annotation generation process. Several noise-robust losses have been proposed and evaluated on tasks in computer vision, but they generally use a single dataset-wise hyperparamter to control the strength of noise resistance. This work proposes novel instance-adaptive training frameworks to change single dataset-wise hyperparameters of noise resistance in such losses to be instance-wise. Such instance-wise noise resistance hyperparameters are predicted by special instance-level label quality predictors, which are trained along with the main classification models. Experiments on noisy and corrupted NLP datasets show that proposed instance-adaptive training frameworks help increase the noise-robustness provided by such losses, promoting the use of the frameworks and associated losses in NLP models trained with noisy data.

instance-adaptive training noise resistance التدريب على سبيل المثال مقاومة الضوضاء صناعة حمض الفوسفور

Keyphrase Extraction with Incomplete Annotated Training Data

704 - Association for Computation Linguistics 2021 مقالة

Extracting keyphrases that summarize the main points of a document is a fundamental task in natural language processing. Supervised approaches to keyphrase extraction(KPE) are largely developed based on the assumption that the training data is fully annotated. However, due to the difficulty of keyphrase annotating, KPE models severely suffer from incomplete annotated problem in many scenarios. To this end, we propose a more robust training method that learns to mitigate the misguidance brought by unlabeled keyphrases. We introduce negative sampling to adjust training loss, and conduct experiments under different scenarios. Empirical studies on synthetic datasets and open domain dataset show that our model is robust to incomplete annotated problem and surpasses prior baselines. Extensive experiments on five scientific domain datasets of different scales demonstrate that our model is competitive with the state-of-the-art method.

incomplete annotated annotated training data incomplete annotated problem غير مكتملة المشروح بيانات التدريب المشروح مشكلة غير مشروعة غير مكتملة صناعة حمض الفوسفور المزيد..

Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

798 - Association for Computation Linguistics 2021 مقالة

Paraphrase generation is an important task in natural language processing. Previous works focus on sentence-level paraphrase generation, while ignoring document-level paraphrase generation, which is a more challenging and valuable task. In this paper , we explore the task of document-level paraphrase generation for the first time and focus on the inter-sentence diversity by considering sentence rewriting and reordering. We propose CoRPG (Coherence Relationship guided Paraphrase Generation), which leverages graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence, which can be used for re-arranging the multiple (possibly modified) input sentences. We create a pseudo document-level paraphrase dataset for training CoRPG. Automatic evaluation results show CoRPG outperforms several strong baseline models on the BERTScore and diversity scores. Human evaluation also shows our model can generate document paraphrase with more diversity and semantic preservation.

paraphrase generation document-level paraphrase generation rewriting and reordering إعادة صياغة إعادة صياغة إعادة صياغة صياغة مستوى المستند إعادة كتابة وإعادة ترتيبها صناعة حمض الفوسفور المزيد..

AESOP: Paraphrase Generation with Adaptive Syntactic Control

725 - Association for Computation Linguistics 2021 مقالة

We propose to control paraphrase generation through carefully chosen target syntactic structures to generate more proper and higher quality paraphrases. Our model, AESOP, leverages a pretrained language model and adds deliberately chosen syntactical control via a retrieval-based selection module to generate fluent paraphrases. Experiments show that AESOP achieves state-of-the-art performances on semantic preservation and syntactic conformation on two benchmark datasets with ground-truth syntactic control from human-annotated exemplars. Moreover, with the retrieval-based target syntax selection module, AESOP generates paraphrases with even better qualities than the current best model using human-annotated target syntactic parses according to human evaluation. We further demonstrate the effectiveness of AESOP to improve classification models' robustness to syntactic perturbation by data augmentation on two GLUE tasks.

adaptive syntactic control generation with adaptive control paraphrase generation السيطرة على النحوية التكيفية جيل مع التكيف التحكم في إعادة صياغة النص صناعة حمض الفوسفور المزيد..

Dealing with training and test segmentation mismatch: FBK@IWSLT2021

545 - Association for Computation Linguistics 2021 مقالة

This paper describes FBK's system submission to the IWSLT 2021 Offline Speech Translation task. We participated with a direct model, which is a Transformer-based architecture trained to translate English speech audio data into German texts. The train ing pipeline is characterized by knowledge distillation and a two-step fine-tuning procedure. Both knowledge distillation and the first fine-tuning step are carried out on manually segmented real and synthetic data, the latter being generated with an MT system trained on the available corpora. Differently, the second fine-tuning step is carried out on a random segmentation of the MuST-C v2 En-De dataset. Its main goal is to reduce the performance drops occurring when a speech translation model trained on manually segmented data (i.e. an ideal, sentence-like segmentation) is evaluated on automatically segmented audio (i.e. actual, more realistic testing conditions). For the same purpose, a custom hybrid segmentation procedure that accounts for both audio content (pauses) and for the length of the produced segments is applied to the test data before passing them to the system. At inference time, we compared this procedure with a baseline segmentation method based on Voice Activity Detection (VAD). Our results indicate the effectiveness of the proposed hybrid approach, shown by a reduction of the gap with manual segmentation from 8.3 to 1.4 BLEU points.

الانجليزية الى الالمانية paper describes fbk describes fbk system تصف الورقة FBK. يصف نظام FBK. صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Coping with Noisy Training Data Labels in Paraphrase Detection

التعامل مع تسميات بيانات التدريب الصاخبة في كشف إعادة صياغة

Ask ChatGPT about the research

Read More

suggested questions