حققت أنظمة ربط الكيان (EL) نتائج مثيرة للإعجاب على المعايير القياسية بشكل أساسي بفضل التمثيلات السياقية المقدمة من نماذج اللغة المحددة مسبقا.ومع ذلك، لا تزال هذه الأنظمة تتطلب كميات ضخمة من البيانات - ملايين الأمثلة المسمى - في أفضل حالاتهم، مع أوقات تدريبية تتجاوز غالبا عدة أيام، خاصة عندما تتوفر موارد حسابية محدودة.في هذه الورقة، ننظر إلى كيفية استغلال التعرف على الكيان المسمى (ner) لتضييق الفجوة بين أنظمة EL المدربين على كميات عالية ومنخفضة من البيانات المسمى.وبشكل أكثر تحديدا، نوضح كيف وإلى أي مدى يمكن للنظام أن يستفيد نظام EL من NER لتعزيز تمثيلات كيانه، وتحسين اختيار المرشح، وحدد عينات سلبية أكثر فعالية وفرض قيود صلبة وناعمة على كيانات الإخراج.نطلق سراح البرامج ونقاط التفتيش النموذجية - في https://github.com/babelscape/ner4el.
Entity Linking (EL) systems have achieved impressive results on standard benchmarks mainly thanks to the contextualized representations provided by recent pretrained language models. However, such systems still require massive amounts of data -- millions of labeled examples -- to perform at their best, with training times that often exceed several days, especially when limited computational resources are available. In this paper, we look at how Named Entity Recognition (NER) can be exploited to narrow the gap between EL systems trained on high and low amounts of labeled data. More specifically, we show how and to what extent an EL system can benefit from NER to enhance its entity representations, improve candidate selection, select more effective negative samples and enforce hard and soft constraints on its output entities. We release our software -- code and model checkpoints -- at https://github.com/Babelscape/ner4el.
References used
https://aclanthology.org/
Dialogue summarization helps readers capture salient information from long conversations in meetings, interviews, and TV series. However, real-world dialogues pose a great challenge to current summarization models, as the dialogue length typically ex
Abstract We study learning named entity recognizers in the presence of missing entity annotations. We approach this setting as tagging with latent variables and propose a novel loss, the Expected Entity Ratio, to learn models in the presence of syste
Abstract We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition
Nowadays, named entity recognition (NER) achieved excellent results on the standard corpora. However, big issues are emerging with a need for an application in a specific domain, because it requires a suitable annotated corpus with adapted NE tag-set
Current work in named entity recognition (NER) shows that data augmentation techniques can produce more robust models. However, most existing techniques focus on augmenting in-domain data in low-resource scenarios where annotated data is quite limite