ﻻ يوجد ملخص باللغة العربية
Distant supervision allows obtaining labeled training corpora for low-resource settings where only limited hand-annotated data exists. However, to be used effectively, the distant supervision must be easy to gather. In this work, we present ANEA, a tool to automatically annotate named entities in texts based on entity lists. It spans the whole pipeline from obtaining the lists to analyzing the errors of the distant supervision. A tuning step allows the user to improve the automatic annotation with their linguistic insights without labelling or checking all tokens manually. In six low-resource scenarios, we show that the F1-score can be increased by on average 18 points through distantly supervised data obtained by ANEA.
We study the open-domain named entity recognition (NER) problem under distant supervision. The distant supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy distant labels via external knowledge
In recent years, great success has been achieved in the field of natural language processing (NLP), thanks in part to the considerable amount of annotated resources. For named entity recognition (NER), most languages do not have such an abundance of
We created this CORD-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020-03-13). This CORD-NER dataset covers 75 fine-grained entity types: In addition to the common bio
Existing models for cross-domain named entity recognition (NER) rely on numerous unlabeled corpus or labeled NER training data in target domains. However, collecting data for low-resource target domains is not only expensive but also time-consuming.
This paper presents a simple and effective approach in low-resource named entity recognition (NER) based on multi-hop dependency trigger. Dependency trigger refer to salient nodes relative to a entity in the dependency graph of a context sentence. Ou