Subscribe to the gold package and get unlimited access to Shamra Academy

MasakhaNER: Named Entity Recognition for African Languages

ماساخانر: التعرف على الكيان المسمى للغات الأفريقية

900 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

مجموعات البيانات الإنجليزية الحالية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Abstract We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1

References used

https://aclanthology.org/

rate research

Data Augmentation for Cross-Domain Named Entity Recognition

869 - Association for Computation Linguistics 2021 مقالة

Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models. However, most existing techniques focus on augmenting in-domain data in low-resource scenarios where annotated data is quite limite d. In this work, we take this research direction to the opposite and study cross-domain data augmentation for the NER task. We investigate the possibility of leveraging data from high-resource domains by projecting it into the low-resource domains. Specifically, we propose a novel neural architecture to transform the data representation from a high-resource to a low-resource domain by learning the patterns (e.g. style, noise, abbreviations, etc.) in the text that differentiate them and a shared feature space where both domains are aligned. We experiment with diverse datasets and show that transforming the data to the low-resource domain representation achieves significant improvements over only using data from high-resource domains.

حقيقي صناعة حمض الفوسفور

Improved Named Entity Recognition for Noisy Call Center Transcripts

995 - Association for Computation Linguistics 2021 مقالة

We explore the application of state-of-the-art NER algorithms to ASR-generated call center transcripts. Previous work in this domain focused on the use of a BiLSTM-CRF model which relied on Flair embeddings; however, such a model is unwieldy in terms of latency and memory consumption. In a production environment, end users require low-latency models which can be readily integrated into existing pipelines. To that end, we present two different models which can be utilized based on the latency and accuracy requirements of the user. First, we propose a set of models which utilize state-of-the-art Transformer language models (RoBERTa) to develop a high-accuracy NER system trained on custom annotated set of call center transcripts. We then use our best-performing Transformer-based model to label a large number of transcripts, which we use to pretrain a BiLSTM-CRF model and further fine-tune on our annotated dataset. We show that this model, while not as accurate as its Transformer-based counterpart, is highly effective in identifying items which require redaction for privacy law compliance. Further, we propose a new general annotation scheme for NER in the call-center environment.

تقييم الصورة تقييم improved named entity noisy call center تحسين الكيان المسمى مركز الاتصال الصاخب صناعة حمض الفوسفور

Dynamic Ensembles in Named Entity Recognition for Historical Arabic Texts

729 - Association for Computation Linguistics 2021 مقالة

The use of Named Entity Recognition (NER) over archaic Arabic texts is steadily increasing. However, most tools have been either developed for modern English or trained over English language documents and are limited over historical Arabic text. Even Arabic NER tools are often trained on modern web-sourced text, making their fit for a historical task questionable. To mitigate historic Arabic NER resource scarcity, we propose a dynamic ensemble model utilizing several learners. The dynamic aspect is achieved by utilizing predictors and features over NER algorithm results that identify which have performed better on a specific task in real-time. We evaluate our approach against state-of-the-art Arabic NER and static ensemble methods over a novel historical Arabic NER task we have created. Our results show that our approach improves upon the state-of-the-art and reaches a 0.8 F-score on this challenging task.

عربي قياسي صناعة حمض الفوسفور

SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation

882 - Association for Computation Linguistics 2021 مقالة

To address a looming crisis of unreproducible evaluation for named entity recognition, we propose guidelines and introduce SeqScore, a software package to improve reproducibility. The guidelines we propose are extremely simple and center around trans parency regarding how chunks are encoded and scored. We demonstrate that despite the apparent simplicity of NER evaluation, unreported differences in the scoring procedure can result in changes to scores that are both of noticeable magnitude and statistically significant. We describe SeqScore, which addresses many of the issues that cause replication failures.

reproducible named entity addressing barriers كيان اسمه استنساخ معالجة الحواجز صناعة حمض الفوسفور

Cross-Lingual Named Entity Recognition via FastAlign: a Case Study

782 - Association for Computation Linguistics 2021 مقالة

Named Entity Recognition is an essential task in natural language processing to detect entities and classify them into predetermined categories. An entity is a meaningful word, or phrase that refers to proper nouns. Named Entities play an important r ole in different NLP tasks such as Information Extraction, Question Answering and Machine Translation. In Machine Translation, named entities often cause translation failures regardless of local context, affecting the output quality of translation. Annotating named entities is a time-consuming and expensive process especially for low-resource languages. One solution for this problem is to use word alignment methods in bilingual parallel corpora in which just one side has been annotated. The goal is to extract named entities in the target language by using the annotated corpus of the source language. In this paper, we compare the performance of two alignment methods, Grow-diag-final-and and Intersect Symmetrisation heuristics, to exploit the annotation projection of English-Brazilian Portuguese bilingual corpus to detect named entities in Brazilian Portuguese. A NER model that is trained on annotated data extracted from the alignment methods, is used to evaluate the performance of aligners. Experimental results show the Intersect Symmetrisation is able to achieve superior performance scores compared to the Grow-diag-final-and heuristic in Brazilian Portuguese.

نماذج التعرف على الكيان case study دراسة الحالة صناعة حمض الفوسفور

MasakhaNER: Named Entity Recognition for African Languages

ماساخانر: التعرف على الكيان المسمى للغات الأفريقية

Ask ChatGPT about the research

Read More

suggested questions