HunFlair: An Easy-to-Use Tool for State-of-the-Art Biomedical Named Entity Recognition

102 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Leon Weber

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Leon Weber - Mario Sanger - Jannes Munchmeyer

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Summary: Named Entity Recognition (NER) is an important step in biomedical information extraction pipelines. Tools for NER should be easy to use, cover multiple entity types, highly accurate, and robust towards variations in text genre and style. To this end, we propose HunFlair, an NER tagger covering multiple entity types integrated into the widely used NLP framework Flair. HunFlair outperforms other state-of-the-art standalone NER tools with an average gain of 7.26 pp over the next best tool, can be installed with a single command and is applied with only four lines of code. Availability: HunFlair is freely available through the Flair framework under an MIT license: https://github.com/flairNLP/flair and is compatible with all major operating systems. Contact:{weberple,saengema,alan.akbik}@informatik.hu-berlin.de

قيم البحث

157 - Xiang Dai , Heike Adel 2020

Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usu ally modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.

الحساب واللغة

An End-to-End Solution for Named Entity Recognition in eCommerce Search

74 - Xiang Cheng , Mitchell Bowden , Bhushan Ramesh Bhange 2020

Named entity recognition (NER) is a critical step in modern search query understanding. In the domain of eCommerce, identifying the key entities, such as brand and product type, can help a search engine retrieve relevant products and therefore offer an engaging shopping experience. Recent research shows promising results on shared benchmark NER tasks using deep learning methods, but there are still unique challenges in the industry regarding domain knowledge, training data, and model production. This paper demonstrates an end-to-end solution to address these challenges. The core of our solution is a novel model training framework TripleLearn which iteratively learns from three separate training datasets, instead of one training set as is traditionally done. Using this approach, the best model lifts the F1 score from 69.5 to 93.3 on the holdout test data. In our offline experiments, TripleLearn improved the model performance compared to traditional training approaches which use a single set of training data. Moreover, in the online A/B test, we see significant improvements in user engagement and revenue conversion. The model has been live on homedepot.com for more than 9 months, boosting search

الحساب واللغة الذكاء الاصطناعي

Entity-Switched Datasets: An Approach to Auditing the In-Domain Robustness of Named Entity Recognition Models

317 - Oshin Agarwal , Yinfei Yang , Byron C. Wallace 2020

Named entity recognition systems perform well on standard datasets comprising English news. But given the paucity of data, it is difficult to draw conclusions about the robustness of systems with respect to recognizing a diverse set of entities. We p ropose a method for auditing the in-domain robustness of systems, focusing specifically on differences in performance due to the national origin of entities. We create entity-switched datasets, in which named entities in the original texts are replaced by plausible named entities of the same type but of different national origin. We find that state-of-the-art systems performance vary widely even in-domain: In the same context, entities from certain origins are more reliably recognized than entities from elsewhere. Systems perform best on American and Indian entities, and worst on Vietnamese and Indonesian entities. This auditing approach can facilitate the development of more robust named entity recognition systems, and will allow research in this area to consider fairness criteria that have received heightened attention in other predictive technology work.

الحساب واللغة

Multi-Grained Named Entity Recognition

91 - Congying Xia , Chenwei Zhang , Tao Yang 2019

This paper presents a novel framework, MGNER, for Multi-Grained Named Entity Recognition where multiple entities or entity mentions in a sentence could be non-overlapping or totally nested. Different from traditional approaches regarding NER as a seq uential labeling task and annotate entities consecutively, MGNER detects and recognizes entities on multiple granularities: it is able to recognize named entities without explicitly assuming non-overlapping or totally nested structures. MGNER consists of a Detector that examines all possible word segments and a Classifier that categorizes entities. In addition, contextual information and a self-attention mechanism are utilized throughout the framework to improve the NER performance. Experimental results show that MGNER outperforms current state-of-the-art baselines up to 4.4% in terms of the F1 score among nested/non-overlapping NER tasks.

الحساب واللغة

A Byte-sized Approach to Named Entity Recognition

77 - Emily Sheng , Prem Natarajan 2018

In biomedical literature, it is common for entity boundaries to not align with word boundaries. Therefore, effective identification of entity spans requires approaches capable of considering tokens that are smaller than words. We introduce a novel, s ubword approach for named entity recognition (NER) that uses byte-pair encodings (BPE) in combination with convolutional and recurrent neural networks to produce byte-level tags of entities. We present experimental results on several standard biomedical datasets, namely the BioCreative VI Bio-ID, JNLPBA, and GENETAG datasets. We demonstrate competitive performance while bypassing the specialized domain expertise needed to create biomedical text tokenization rules.

الحساب واللغة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة إيبلا الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

HunFlair: An Easy-to-Use Tool for State-of-the-Art Biomedical Named Entity Recognition

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً