تحقق هذه الورقة في فعالية مهمة الإحداثي التلقائي لشرح النص في مجالات الخبراء. في مهمة إنشاء كورسرا المشروح عالي الجودة، غالبا ما تغطي مجالات الخبراء مجالات فرعية متعددة (مثل الكيمياء العضوية وغير العضوية في مجال الكيمياء) إما صراحة أو ضمنيا. لذلك، من الأهمية بمكان تعيين معلقين على المستندات ذات الصلة بخبرتهم المجالين الدقيقة. ومع ذلك، فإن معظم الأساليب الحالية لتقدير التعشيد التقديري موثوقية لكل معلقي أو مثيل مشروح فقط بعد عملية التوضيحية. لمعالجة هذه المسألة، نقترح طريقة لتقدير خبرات المجال في كل Annotator قبل عملية الشرح باستخدام المعلومات المتاحة بسهولة من المحن المعلقين مسبقا. نقترح تدبيرين لتقدير خبرات الهنود: إجراء واضح باستخدام الفئات المحددة مسبقا من المجالات الفرعية، وتدبير ضمني باستخدام تمثيلات موزعة للوثائق. تظهر النتائج التجريبية على مهام شرح الاسم الكيميائي أن دقة التوضيحية تتحسن عندما يتم دمج كل من التدابير الصريحة والمنامية للتخصيص العنفي.
This paper investigates the effectiveness of automatic annotator assignment for text annotation in expert domains. In the task of creating high-quality annotated corpora, expert domains often cover multiple sub-domains (e.g. organic and inorganic chemistry in the chemistry domain) either explicitly or implicitly. Therefore, it is crucial to assign annotators to documents relevant with their fine-grained domain expertise. However, most of existing methods for crowdsoucing estimate reliability of each annotator or annotated instance only after the annotation process. To address the issue, we propose a method to estimate the domain expertise of each annotator before the annotation process using information easily available from the annotators beforehand. We propose two measures to estimate the annotator expertise: an explicit measure using the predefined categories of sub-domains, and an implicit measure using distributed representations of the documents. The experimental results on chemical name annotation tasks show that the annotation accuracy improves when both explicit and implicit measures for annotator assignment are combined.
References used
https://aclanthology.org/
Medical simulators provide a controlled environment for training and assessing clinical skills. However, as an assessment platform, it requires the presence of an experienced examiner to provide performance feedback, commonly preformed using a task s
This paper presents the results of the WMT21 Metrics Shared Task. Participants were asked to score the outputs of the translation systems competing in the WMT21 News Translation Task with automatic metrics on two different domains: news and TED talks
Text style can reveal sensitive attributes of the author (e.g. age and race) to the reader, which can, in turn, lead to privacy violations and bias in both human and algorithmic decisions based on text. For example, the style of writing in job applic
Deceptive news posts shared in online communities can be detected with NLP models, and much recent research has focused on the development of such models. In this work, we use characteristics of online communities and authors --- the context of how a
Biomaterials are synthetic or natural materials used for constructing artificial organs, fabricating prostheses, or replacing tissues. The last century saw the development of thousands of novel biomaterials and, as a result, an exponential increase i