تقدم هذه الورقة مجموعة بيانات من حكايات خرافية الألمانية، المشروح يدويا مع شبكات الأحرف التي تم الحصول عليها مع اتفاق متزايد في الترحيل.يوفر إصدار هذه الدولة فرصة للتدريب ومقارنة خوارزميات مختلفة لاستخراج شبكات الأحرف، والتي كانت بالكاد حتى الآن بسبب المصالح غير المتجانسة للباحثين السابقين.نوضح فائدة بياناتنا المحددة من خلال توفير تجارب أساسية لاستخراج شبكات الأحرف التلقائية، وتطبيق خط أنابيب قائم على القواعد وكذلك النهج العصبي، والعثور على النهج العصبي تفوق نهج القواعد في معظم إعدادات التقييم.
This paper presents a data set of German fairy tales, manually annotated with character networks which were obtained with high inter rater agreement. The release of this corpus provides an opportunity of training and comparing different algorithms for the extraction of character networks, which so far was barely possible due to heterogeneous interests of previous researchers. We demonstrate the usefulness of our data set by providing baseline experiments for the automatic extraction of character networks, applying a rule-based pipeline as well as a neural approach, and find the neural approach outperforming the rule-approach in most evaluation settings.
References used
https://aclanthology.org/
The advantage of peer-to-peer (P2P) paradigm relies on two main concepts: cooperation among
users and resource sharing. There are many applications based on peer-to-peer paradigm, but
the most popular one is the file sharing. We can classify the fi
This paper describes the GLAUx project (the Greek Language Automated''), an ongoing effort to develop a large long-term diachronic corpus of Greek, covering sixteen centuries of literary and non-literary material annotated with NLP methods. After pro
The paper reports on a corpus study of German light verb constructions (LVCs). LVCs come in families which exemplify systematic interpretation patterns. The paper's aim is to account for the properties determining these patterns on the basis of a corpus study on German LVCs of the type stehen unter' NP' (stand under NP').
Historically speaking, the German legal language is widely neglected in NLP research, especially in summarization systems, as most of them are based on English newspaper articles. In this paper, we propose the task of automatic summarization of Germa
Sememes are defined as the atomic units to describe the semantic meaning of concepts. Due to the difficulty of manually annotating sememes and the inconsistency of annotations between experts, the lexical sememe prediction task has been proposed. How