Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Multilingual Coreference Resolution with Harmonized Annotations

دقة تحرير المعلومات متعددة اللغات مع التعليقات التوضيحية المنسقة

466 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

multilingual coreference resolution present coreference resolution قرار تحرير المعلومات متعدد اللغات حاضر القرار الأساسية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we present coreference resolution experiments with a newly created multilingual corpus CorefUD (Nedoluzhko et al.,2021). We focus on the following languages: Czech, Russian, Polish, German, Spanish, and Catalan. In addition to monolingual experiments, we combine the training data in multilingual experiments and train two joined models - for Slavic languages and for all the languages together. We rely on an end-to-end deep learning model that we slightly adapted for the CorefUD corpus. Our results show that we can profit from harmonized annotations, and using joined models helps significantly for the languages with smaller training data.

References used

https://aclanthology.org/

rate research

Do UD Trees Match Mention Spans in Coreference Annotations?

391 - Association for Computation Linguistics 2021 مقالة

One can find dozens of data resources for various languages in which coreference - a relation between two or more expressions that refer to the same real-world entity - is manually annotated. One could also assume that such expressions usually consti tute syntactically meaningful units; however, mention spans have been annotated simply by delimiting token intervals in most coreference projects, i.e., independently of any syntactic representation. We argue that it could be advantageous to make syntactic and coreference annotations convergent in the long term. We present a pilot empirical study focused on matches and mismatches between hand-annotated linear mention spans and automatically parsed syntactic trees that follow Universal Dependencies conventions. The study covers 9 datasets for 8 different languages.

match mention spans trees match mention match mention تذكر المباراة تذكر الأشجار ذكر المباراة صناعة حمض الفوسفور المزيد..

Constructing contrastive samples via summarization for text classification with limited annotations

602 - Association for Computation Linguistics 2021 مقالة

Contrastive Learning has emerged as a powerful representation learning method and facilitates various downstream tasks especially when supervised data is limited. How to construct efficient contrastive samples through data augmentation is key to its success. Unlike vision tasks, the data augmentation method for contrastive learning has not been investigated sufficiently in language tasks. In this paper, we propose a novel approach to construct contrastive samples for language tasks using text summarization. We use these samples for supervised contrastive learning to gain better text representations which greatly benefit text classification tasks with limited annotations. To further improve the method, we mix up samples from different classes and add an extra regularization, named Mixsum, in addition to the cross-entropy-loss. Experiments on real-world text classification datasets (Amazon-5, Yelp-5, AG News, and IMDb) demonstrate the effectiveness of the proposed contrastive learning framework with summarization-based data augmentation and Mixsum regularization.

contrastive contrastive learning مناقض التعلم على النقيض صناعة حمض الفوسفور

Sequential Cross-Document Coreference Resolution

568 - Association for Computation Linguistics 2021 مقالة

Relating entities and events in text is a key component of natural language understanding. Cross-document coreference resolution, in particular, is important for the growing interest in multi-document analysis tasks. In this work we propose a new mod el that extends the efficient sequential prediction paradigm for coreference resolution to cross-document settings and achieves competitive results for both entity and event coreference while providing strong evidence of the efficacy of both sequential models and higher-order inference in cross-document settings. Our model incrementally composes mentions into cluster representations and predicts links between a mention and the already constructed clusters, approximating a higher-order model. In addition, we conduct extensive ablation studies that provide new insights into the importance of various inputs and representation types in coreference.

الرسوم البيانية المعرفة الحبيبات coreference comeference. صناعة حمض الفوسفور

Neural Media Bias Detection Using Distant Supervision With BABE - Bias Annotations By Experts

465 - Association for Computation Linguistics 2021 مقالة

Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is challenging , primarily due to the lack of a gold standard data set and high context dependencies. This paper presents BABE, a robust and diverse data set created by trained experts, for media bias research. We also analyze why expert labeling is essential within this domain. Our data set offers better annotation quality and higher inter-annotator agreement than existing work. It consists of 3,700 sentences balanced among topics and outlets, containing media bias labels on the word and sentence level. Based on our data, we also introduce a way to detect bias-inducing sentences in news articles automatically. Our best performing BERT-based model is pre-trained on a larger corpus consisting of distant labels. Fine-tuning and evaluating the model on our proposed supervised data set, we achieve a macro F1-score of 0.804, outperforming existing methods.

media bias detection كشف التحيز الإعلامي صناعة حمض الفوسفور

Learning Cross-lingual Representations for Event Coreference Resolution with Multi-view Alignment and Optimal Transport

920 - Association for Computation Linguistics 2021 مقالة

We study a new problem of cross-lingual transfer learning for event coreference resolution (ECR) where models trained on data from a source language are adapted for evaluations in different target languages. We introduce the first baseline model for this task based on XLM-RoBERTa, a state-of-the-art multilingual pre-trained language model. We also explore language adversarial neural networks (LANN) that present language discriminators to distinguish texts from the source and target languages to improve the language generalization for ECR. In addition, we introduce two novel mechanisms to further enhance the general representation learning of LANN, featuring: (i) multi-view alignment to penalize cross coreference-label alignment of examples in the source and target languages, and (ii) optimal transport to select close examples in the source and target languages to provide better training signals for the language discriminators. Finally, we perform extensive experiments for cross-lingual ECR from English to Spanish and Chinese to demonstrate the effectiveness of the proposed methods.

متخصصة لغة متعددة اللغات صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Multilingual Coreference Resolution with Harmonized Annotations

دقة تحرير المعلومات متعددة اللغات مع التعليقات التوضيحية المنسقة

Ask ChatGPT about the research

Read More

suggested questions