ركز العمل السابق على ربط الكيان على الموارد المستهدفة التي يذكر الكيان المسمى غير المتداخلة غير المتداخلة، وغالبا ما تكون في بيانات من ويكيبيديا، أي Wikification.في هذه الورقة، نقدم وتقييم Wikigum، ومجموعة بيانات مميزة بالكامل، والتي تغطي جميع الإشراهات من الكيانات المسماة، بما في ذلك تذكرها غير المسماة وغير المسماة، وكذلك تذكر متداخلة داخل الإشراهات الأخرى.تغطي مجموعة البيانات مجموعة واسعة من 12 نوعا مكتوبا وتحطمنا، والتي لم يتم تضمين معظمها في كيان يربط الجهود حتى الآن، مما يؤدي إلى ضعف الأداء من قبل نظام Sota المحدد في تقييمنا.يتيح توافر مجموعة متنوعة من التعليقات التوضيحية الأخرى لنفس البيانات أيضا البحث عن كيانات في السياق.
Previous work on Entity Linking has focused on resources targeting non-nested proper named entity mentions, often in data from Wikipedia, i.e. Wikification. In this paper, we present and evaluate WikiGUM, a fully wikified dataset, covering all mentions of named entities, including their non-named and pronominal mentions, as well as mentions nested within other mentions. The dataset covers a broad range of 12 written and spoken genres, most of which have not been included in Entity Linking efforts to date, leading to poor performance by a pretrained SOTA system in our evaluation. The availability of a variety of other annotations for the same data also enables further research on entities in context.
References used
https://aclanthology.org/
Entity Linking (EL) systems have achieved impressive results on standard benchmarks mainly thanks to the contextualized representations provided by recent pretrained language models. However, such systems still require massive amounts of data -- mill
Biomedical Named Entities are complex, so approximate matching has been used to improve entity coverage. However, the usual approximate matching approach fetches only one matching result, which is often noisy. In this work, we propose a method for bi
Disagreement between coders is ubiquitous in virtually all datasets annotated with human judgements in both natural language processing and computer vision. However, most supervised machine learning methods assume that a single preferred interpretati
Due to large number of entities in biomedical knowledge bases, only a small fraction of entities have corresponding labelled training data. This necessitates entity linking models which are able to link mentions of unseen entities using learned repre
Entity linking is an important problem with many applications. Most previous solutions were designed for settings where annotated training data is available, which is, however, not the case in numerous domains. We propose a light-weight and scalable