Research papers, master and doctoral theses about level

Evidence Selection as a Token-Level Prediction Task

648 - Association for Computation Linguistics 2021 مقالة

In Automated Claim Verification, we retrieve evidence from a knowledge base to determine the veracity of a claim. Intuitively, the retrieval of the correct evidence plays a crucial role in this process. Often, evidence selection is tackled as a pairw ise sentence classification task, i.e., we train a model to predict for each sentence individually whether it is evidence for a claim. In this work, we fine-tune document level transformers to extract all evidence from a Wikipedia document at once. We show that this approach performs better than a comparable model classifying sentences individually on all relevant evidence selection metrics in FEVER. Our complete pipeline building on this evidence selection procedure produces a new state-of-the-art result on FEVER, a popular claim verification benchmark.

token-level prediction task token-level prediction prediction task مهمة التنبؤ على مستوى الرمز المميز التنبؤ على مستوى الرمز مهمة التنبؤ صناعة حمض الفوسفور المزيد..

Well-Defined Morphology is Sentence-Level Morphology

762 - Association for Computation Linguistics 2021 مقالة

Morphological tasks have gained decent popularity within the NLP community in the recent years, with large multi-lingual datasets providing morphological analysis of words, either in or out of context. However, the lack of a clear linguistic definiti on for words destines the annotative work to be incomplete and mired in inconsistencies, especially cross-linguistically. In this work we expand morphological inflection of words to inflection of sentences to provide true universality disconnected from orthographic traditions of white-space usage. To allow annotation for sentence-inflection we define a morphological annotation scheme by a fixed set of inflectional features. We present a small cross-linguistic dataset including semi-manually generated simple sentences in 4 typologically diverse languages annotated according to our suggested scheme, and show that the task of reinflection gets substantially more difficult but that the change of scope from words to well-defined sentences allows interface with contextualized language models.

sentence-level morphology morphology is sentence-level well-defined morphology التشكل على مستوى الجملة التشكل هو الجملة على مستوى مورفولوجيا محددة جيدا صناعة حمض الفوسفور المزيد..

Bidirectional Hierarchical Attention Networks based on Document-level Context for Emotion Cause Extraction

725 - Association for Computation Linguistics 2021 مقالة

Emotion cause extraction (ECE) aims to extract the causes behind the certain emotion in text. Some works related to the ECE task have been published and attracted lots of attention in recent years. However, these methods neglect two major issues: 1) pay few attentions to the effect of document-level context information on ECE, and 2) lack of sufficient exploration for how to effectively use the annotated emotion clause. For the first issue, we propose a bidirectional hierarchical attention network (BHA) corresponding to the specified candidate cause clause to capture the document-level context in a structured and dynamic manner. For the second issue, we design an emotional filtering module (EF) for each layer of the graph attention network, which calculates a gate score based on the emotion clause to filter the irrelevant information. Combining the BHA and EF, the EF-BHA can dynamically aggregate the contextual information from two directions and filters irrelevant information. The experimental results demonstrate that EF-BHA achieves the competitive performances on two public datasets in different languages (Chinese and English). Moreover, we quantify the effect of context on emotion cause extraction and provide the visualization of the interactions between candidate cause clauses and contexts.

bidirectional hierarchical attention hierarchical attention networks document-level context الاهتمام الهرمي ثنائي الاتجاه شبكات الاهتمام الهرمية السياق على مستوى المستند صناعة حمض الفوسفور المزيد..

EDTC: A Corpus for Discourse-Level Topic Chain Parsing

856 - Association for Computation Linguistics 2021 مقالة

Discourse analysis has long been known to be fundamental in natural language processing. In this research, we present our insight on discourse-level topic chain (DTC) parsing which aims at discovering new topics and investigating how these topics evo lve over time within an article. To address the lack of data, we contribute a new discourse corpus with DTC-style dependency graphs annotated upon news articles. In particular, we ensure the high reliability of the corpus by utilizing a two-step annotation strategy to build the data and filtering out the annotations with low confidence scores. Based on the annotated corpus, we introduce a simple yet robust system for automatic discourse-level topic chain parsing.

discourse-level topic chain topic chain parsing topic chain سلسلة موضوع الخطاب تخليل سلسلة الموضوعات سلسلة موضوع صناعة حمض الفوسفور المزيد..

Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization

674 - Association for Computation Linguistics 2021 مقالة

This paper describes the HEL-LJU submissions to the MultiLexNorm shared task on multilingual lexical normalization. Our system is based on a BERT token classification preprocessing step, where for each token the type of the necessary transformation i s predicted (none, uppercase, lowercase, capitalize, modify), and a character-level SMT step where the text is translated from original to normalized given the BERT-predicted transformation constraints. For some languages, depending on the results on development data, the training data was extended by back-translating OpenSubtitles data. In the final ordering of the ten participating teams, the HEL-LJU team has taken the second place, scoring better than the previous state-of-the-art.

bert-constrained character-level moses multilingual lexical normalization character-level moses models بريه مقيدة مستوى الطابع موسى التطبيع المعجمي متعدد اللغات طرازات موسى مستوى الأحرف صناعة حمض الفوسفور المزيد..

Reference-Free Word- and Sentence-Level Translation Evaluation with Token-Matching Metrics

1225 - Association for Computation Linguistics 2021 مقالة

Many modern machine translation evaluation metrics like BERTScore, BLEURT, COMET, MonoTransquest or XMoverScore are based on black-box language models. Hence, it is difficult to explain why these metrics return certain scores. This year's Eval4NLP sh ared task tackles this challenge by searching for methods that can extract feature importance scores that correlate well with human word-level error annotations. In this paper we show that unsupervised metrics that are based on tokenmatching can intrinsically provide such scores. The submitted system interprets the similarities of the contextualized word-embeddings that are used to compute (X)BERTScore as word-level importance scores.

sentence-level translation evaluation reference-free word translation evaluation metrics تقييم مستوى الترجمة كلمة مجانية مقاييس تقييم الترجمة صناعة حمض الفوسفور المزيد..

Controlled Neural Sentence-Level Reframing of News Articles

795 - Association for Computation Linguistics 2021 مقالة

Framing a news article means to portray the reported event from a specific perspective, e.g., from an economic or a health perspective. Reframing means to change this perspective. Depending on the audience or the submessage, reframing can become nece ssary to achieve the desired effect on the readers. Reframing is related to adapting style and sentiment, which can be tackled with neural text generation techniques. However, it is more challenging since changing a frame requires rewriting entire sentences rather than single phrases. In this paper, we study how to computationally reframe sentences in news articles while maintaining their coherence to the context. We treat reframing as a sentence-level fill-in-the-blank task for which we train neural models on an existing media frame corpus. To guide the training, we propose three strategies: framed-language pretraining, named-entity preservation, and adversarial learning. We evaluate respective models automatically and manually for topic consistency, coherence, and successful reframing. Our results indicate that generating properly-framed text works well but with tradeoffs.

controlled neural sentence-level reframing controlled neural السمة العصبية التي تسيطر عليها المستوى Reframing. الخيانة العصبية صناعة حمض الفوسفور المزيد..

Sentence-level Planning for Especially Abstractive Summarization

790 - Association for Computation Linguistics 2021 مقالة

Abstractive summarization models heavily rely on copy mechanisms, such as the pointer network or attention, to achieve good performance, measured by textual overlap with reference summaries. As a result, the generated summaries stay close to the form ulations in the source document. We propose the *sentence planner* model to generate more abstractive summaries. It includes a hierarchical decoder that first generates a representation for the next summary sentence, and then conditions the word generator on this representation. Our generated summaries are more abstractive and at the same time achieve high ROUGE scores when compared to human reference summaries. We verify the effectiveness of our design decisions with extensive evaluations.

sentence-level planning abstractive summarization models التخطيط على مستوى الجملة نماذج تلخيص الجماع صناعة حمض الفوسفور

A Language Model-based Generative Classifier for Sentence-level Discourse Parsing

764 - Association for Computation Linguistics 2021 مقالة

Discourse segmentation and sentence-level discourse parsing play important roles for various NLP tasks to consider textual coherence. Despite recent achievements in both tasks, there is still room for improvement due to the scarcity of labeled data. To solve the problem, we propose a language model-based generative classifier (LMGC) for using more information from labels by treating the labels as an input while enhancing label representations by embedding descriptions for each label. Moreover, since this enables LMGC to make ready the representations for labels, unseen in the pre-training step, we can effectively use a pre-trained language model in LMGC. Experimental results on the RST-DT dataset show that our LMGC achieved the state-of-the-art F1 score of 96.72 in discourse segmentation. It further achieved the state-of-the-art relation F1 scores of 84.69 with gold EDU boundaries and 81.18 with automatically segmented boundaries, respectively, in sentence-level discourse parsing.

sentence-level discourse parsing model-based generative classifier language model-based generative تحليل خطاب مستوى الجملة مصنف المولد النموذجي لغة اللغة القائمة على نموذج صناعة حمض الفوسفور المزيد..

Document-Level Text Simplification: Dataset, Criteria and Baseline

1430 - Association for Computation Linguistics 2021 مقالة

Text simplification is a valuable technique. However, current research is limited to sentence simplification. In this paper, we define and investigate a new task of document-level text simplification, which aims to simplify a document consisting of m ultiple sentences. Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia and perform analysis and human evaluation on it to show that the dataset is reliable. Then, we propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task. Finally, we select several representative models as baseline models for this task and perform automatic evaluation and human evaluation. We analyze the results and point out the shortcomings of the baseline models.

إزالة السموم باستخدام كبير document-level text نص مستوى المستند صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد