تشكل الكيانات المتساقطة تحديا للتعرف على الكيان المسمى (NER). تحدث هذه الظواهر عادة في المجال الطبي الطبيعي. كحل، يتم استخدام توسعات مخطط التمثيل الحيوي الذي يمكنه التعامل مع أنواع هذه أنواع الكيان بشكل شائع (I.E. BIOHD). ومع ذلك، فإن أنواع العلامات الإضافية تجعل مهمة NER أكثر صعوبة في التعلم. في هذه الورقة نقترحنا بديلا؛ مخطط حيوي مستمر غامض (Fuzzybio). نحن نركز على مهمة استخراج استجابة المخدرات الضارة والتطبيع لمقارنة Fuzzybio إلى BIOHD. نجد أن Fuzzybio يحسن استدعاء NER لشخصين من ثلاثة مجموعات بيانات ونتائج أعلى نسبة أعلى من الكيانات المفككة والمركبة المحددة بشكل صحيح لجميع مجموعات البيانات. يؤدي استخدام FuzzyBio أيضا إلى تحسين الأداء المنتهي للكيانات المستمرة والمركبة في مجموعتين من مجموعات البيانات الثلاثة. نظرا لأن Fuzzybio يحسن الأداء لبعض مجموعات البيانات والتحويل من Biohd إلى Fuzzybio واضح، فإننا نوصي بالتحقيق الأكثر فعالية لأي مجموعة بيانات تحتوي على كيانات متقطعة.
Discontinuous entities pose a challenge to named entity recognition (NER). These phenomena occur commonly in the biomedical domain. As a solution, expansions of the BIO representation scheme that can handle these entity types are commonly used (i.e. BIOHD). However, the extra tag types make the NER task more difficult to learn. In this paper we propose an alternative; a fuzzy continuous BIO scheme (FuzzyBIO). We focus on the task of Adverse Drug Response extraction and normalization to compare FuzzyBIO to BIOHD. We find that FuzzyBIO improves recall of NER for two of three data sets and results in a higher percentage of correctly identified disjoint and composite entities for all data sets. Using FuzzyBIO also improves end-to-end performance for continuous and composite entities in two of three data sets. Since FuzzyBIO improves performance for some data sets and the conversion from BIOHD to FuzzyBIO is straightforward, we recommend investigating which is more effective for any data set containing discontinuous entities.
References used
https://aclanthology.org/
Abstract Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens. Morphologically rich languages (MRLs) pose a challenge to this basic formulation, as the boundaries of named entities
Neural networks are the state-of-the-art method of machine learning for many problems in NLP. Their success in machine translation and other NLP tasks is phenomenal, but their interpretability is challenging. We want to find out how neural networks r
Analyzing microblogs where we post what we experience enables us to perform various applications such as social-trend analysis and entity recommendation. To track emerging trends in a variety of areas, we want to categorize information on emerging en
Abstract ⚠ This paper contains prompts and model outputs that are offensive in nature. When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: Th
Timeline Summarisation (TLS) aims to generate a concise, time-ordered list of events described in sources such as news articles. However, current systems do not provide an adequate way to adapt to new domains nor to focus on the aspects of interest t