New community

Subscribe to the gold package and get unlimited access to Shamra Academy

RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness of Named Entity Recognition Models

Rockner: طريقة بسيطة لإنشاء أمثلة خصم لتقييم متانة نماذج التعرف على الكيان المسمى

497 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

To audit the robustness of named entity recognition (NER) models, we propose RockNER, a simple yet effective method to create natural adversarial examples. Specifically, at the entity level, we replace target entities with other entities of the same semantic class in Wikidata; at the context level, we use pre-trained language models (e.g., BERT) to generate word substitutions. Together, the two levels of at- tack produce natural adversarial examples that result in a shifted distribution from the training data on which our target models have been trained. We apply the proposed method to the OntoNotes dataset and create a new benchmark named OntoRock for evaluating the robustness of existing NER models via a systematic evaluation protocol. Our experiments and analysis reveal that even the best model has a significant performance drop, and these models seem to memorize in-domain entity patterns instead of reasoning from the context. Our work also studies the effects of a few simple data augmentation methods to improve the robustness of NER models.

References used

https://aclanthology.org/

rate research

MasakhaNER: Named Entity Recognition for African Languages

485 - Association for Computation Linguistics 2021 مقالة

Abstract We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1

مجموعات البيانات الإنجليزية الحالية صناعة حمض الفوسفور

Data Augmentation for Cross-Domain Named Entity Recognition

395 - Association for Computation Linguistics 2021 مقالة

Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models. However, most existing techniques focus on augmenting in-domain data in low-resource scenarios where annotated data is quite limite d. In this work, we take this research direction to the opposite and study cross-domain data augmentation for the NER task. We investigate the possibility of leveraging data from high-resource domains by projecting it into the low-resource domains. Specifically, we propose a novel neural architecture to transform the data representation from a high-resource to a low-resource domain by learning the patterns (e.g. style, noise, abbreviations, etc.) in the text that differentiate them and a shared feature space where both domains are aligned. We experiment with diverse datasets and show that transforming the data to the low-resource domain representation achieves significant improvements over only using data from high-resource domains.

حقيقي صناعة حمض الفوسفور

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

289 - Association for Computation Linguistics 2021 مقالة

Abstract In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity. We introduce NRB, a new testbed carefully designed to diagnose Name Regularity Bias of NER models. Our results indicate that all state-of-the-art models we tested show such a bias; BERT fine-tuned models significantly outperforming feature-based (LSTM-CRF) ones on NRB, despite having comparable (sometimes lower) performance on standard benchmarks. To mitigate this bias, we propose a novel model-agnostic training method that adds learnable adversarial noise to some entity mentions, thus enforcing models to focus more strongly on the contextual signal, leading to significant gains on NRB. Combining it with two other training strategies, data augmentation and parameter freezing, leads to further gains.

محولات متعددة الوسائط صناعة حمض الفوسفور

Improved Named Entity Recognition for Noisy Call Center Transcripts

425 - Association for Computation Linguistics 2021 مقالة

We explore the application of state-of-the-art NER algorithms to ASR-generated call center transcripts. Previous work in this domain focused on the use of a BiLSTM-CRF model which relied on Flair embeddings; however, such a model is unwieldy in terms of latency and memory consumption. In a production environment, end users require low-latency models which can be readily integrated into existing pipelines. To that end, we present two different models which can be utilized based on the latency and accuracy requirements of the user. First, we propose a set of models which utilize state-of-the-art Transformer language models (RoBERTa) to develop a high-accuracy NER system trained on custom annotated set of call center transcripts. We then use our best-performing Transformer-based model to label a large number of transcripts, which we use to pretrain a BiLSTM-CRF model and further fine-tune on our annotated dataset. We show that this model, while not as accurate as its Transformer-based counterpart, is highly effective in identifying items which require redaction for privacy law compliance. Further, we propose a new general annotation scheme for NER in the call-center environment.

تقييم الصورة تقييم improved named entity noisy call center تحسين الكيان المسمى مركز الاتصال الصاخب صناعة حمض الفوسفور

SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation

468 - Association for Computation Linguistics 2021 مقالة

To address a looming crisis of unreproducible evaluation for named entity recognition, we propose guidelines and introduce SeqScore, a software package to improve reproducibility. The guidelines we propose are extremely simple and center around trans parency regarding how chunks are encoded and scored. We demonstrate that despite the apparent simplicity of NER evaluation, unreported differences in the scoring procedure can result in changes to scores that are both of noticeable magnitude and statistically significant. We describe SeqScore, which addresses many of the issues that cause replication failures.

reproducible named entity addressing barriers كيان اسمه استنساخ معالجة الحواجز صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness of Named Entity Recognition Models

Rockner: طريقة بسيطة لإنشاء أمثلة خصم لتقييم متانة نماذج التعرف على الكيان المسمى

Ask ChatGPT about the research

Read More

suggested questions