Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues

DELA CORPUS - كوربوس على مستوى المستند المشروح مع القضايا المتعلقة بالسياق

950 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Recently, the Machine Translation (MT) community has become more interested in document-level evaluation especially in light of reactions to claims of human parity'', since examining the quality at the level of the document rather than at the sentence level allows for the assessment of suprasentential context, providing a more reliable evaluation. This paper presents a document-level corpus annotated in English with context-aware issues that arise when translating from English into Brazilian Portuguese, namely ellipsis, gender, lexical ambiguity, number, reference, and terminology, with six different domains. The corpus can be used as a challenge test set for evaluation and as a training/testing corpus for MT as well as for deep linguistic analysis of context issues. To the best of our knowledge, this is the first corpus of its kind.

References used

https://aclanthology.org/

rate research

The GLAUx corpus: methodological issues in designing a long-term, diverse, multi-layered corpus of Ancient Greek

786 - Association for Computation Linguistics 2021 مقالة

This paper describes the GLAUx project (the Greek Language Automated''), an ongoing effort to develop a large long-term diachronic corpus of Greek, covering sixteen centuries of literary and non-literary material annotated with NLP methods. After pro viding an overview of related corpus projects and discussing the general architecture of the corpus, it zooms in on a number of larger methodological issues in the design of historical corpora. These include the encoding of textual variants, handling extralinguistic variation and annotating linguistic ambiguity. Finally, the long- and short-term perspectives of this project are discussed.

greek language automated ancient greek greek language اللغة اليونانية الآلي اليونانية القديمة اللغة اليونانية صناعة حمض الفوسفور المزيد..

EDTC: A Corpus for Discourse-Level Topic Chain Parsing

888 - Association for Computation Linguistics 2021 مقالة

Discourse analysis has long been known to be fundamental in natural language processing. In this research, we present our insight on discourse-level topic chain (DTC) parsing which aims at discovering new topics and investigating how these topics evo lve over time within an article. To address the lack of data, we contribute a new discourse corpus with DTC-style dependency graphs annotated upon news articles. In particular, we ensure the high reliability of the corpus by utilizing a two-step annotation strategy to build the data and filtering out the annotations with low confidence scores. Based on the annotated corpus, we introduce a simple yet robust system for automatic discourse-level topic chain parsing.

discourse-level topic chain topic chain parsing topic chain سلسلة موضوع الخطاب تخليل سلسلة الموضوعات سلسلة موضوع صناعة حمض الفوسفور المزيد..

Coupling Context Modeling with Zero Pronoun Recovering for Document-Level Natural Language Generation

1112 - Association for Computation Linguistics 2021 مقالة

Natural language generation (NLG) tasks on pro-drop languages are known to suffer from zero pronoun (ZP) problems, and the problems remain challenging due to the scarcity of ZP-annotated NLG corpora. In this case, we propose a highly adaptive two-sta ge approach to couple context modeling with ZP recovering to mitigate the ZP problem in NLG tasks. Notably, we frame the recovery process in a task-supervised fashion where the ZP representation recovering capability is learned during the NLG task learning process, thus our method does not require NLG corpora annotated with ZPs. For system enhancement, we learn an adversarial bot to adjust our model outputs to alleviate the error propagation caused by mis-recovered ZPs. Experiments on three document-level NLG tasks, i.e., machine translation, question answering, and summarization, show that our approach can improve the performance to a great extent, and the improvement on pronoun translation is very impressive.

تلخيص الجملة coupling context modeling نموذج سياق اقتران صناعة حمض الفوسفور

Modeling Document-Level Context for Event Detection via Important Context Selection

875 - Association for Computation Linguistics 2021 مقالة

The task of Event Detection (ED) in Information Extraction aims to recognize and classify trigger words of events in text. The recent progress has featured advanced transformer-based language models (e.g., BERT) as a critical component in state-of-th e-art models for ED. However, the length limit for input texts is a barrier for such ED models as they cannot encode long-range document-level context that has been shown to be beneficial for ED. To address this issue, we propose a novel method to model document-level context for ED that dynamically selects relevant sentences in the document for the event prediction of the target sentence. The target sentence will be then augmented with the selected sentences and consumed entirely by transformer-based language models for improved representation learning for ED. To this end, the REINFORCE algorithm is employed to train the relevant sentence selection for ED. Several information types are then introduced to form the reward function for the training process, including ED performance, sentence similarity, and discourse relations. Our extensive experiments on multiple benchmark datasets reveal the effectiveness of the proposed model, leading to new state-of-the-art performance.

important context selection important context اختيار السياق الهامة سياق مهم صناعة حمض الفوسفور

Studying The Impact Of Document-level Context On Simultaneous Neural Machine Translation

1048 - Association for Computation Linguistics 2021 مقالة

In a real-time simultaneous translation setting and neural machine translation (NMT) models start generating target language tokens from incomplete source language sentences and making them harder to translate and leading to poor translation quality. Previous research has shown that document-level NMT and comprising of sentence and context encoders and a decoder and leverages context from neighboring sentences and helps improve translation quality. In simultaneous translation settings and the context from previous sentences should be even more critical. To this end and in this paper and we propose wait-k simultaneous document-level NMT where we keep the context encoder as it is and replace the source sentence encoder and target language decoder with their wait-k equivalents. We experiment with low and high resource settings using the ALT and OpenSubtitles2018 corpora and where we observe minor improvements in translation quality. We then perform an analysis of the translations obtained using our models by focusing on sentences that should benefit from the context where we found out that the model does and in fact and benefit from context but is unable to effectively leverage it and especially in a low-resource setting. This shows that there is a need for further innovation in the way useful context is identified and leveraged.

اختيار المرشح القائم على المعنويات صناعة حمض الفوسفور

DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues

DELA CORPUS - كوربوس على مستوى المستند المشروح مع القضايا المتعلقة بالسياق

Ask ChatGPT about the research

Read More

suggested questions