نقدم مجموعة بيانات تغيير دلالية معجمية مشروحة يدويا للروسية: رشيفتيفال.يتم ضمان حداثةها من خلال مجموعة واحدة من الكلمات المستهدفة المشروحة لتحولاتهم الدلالية DIACHRONIC عبر ثلاث فترات زمنية، بينما استخدم العمل السابق فترات زمنية فقط أو مجموعات مختلفة من الكلمات المستهدفة.تصف الورقة الإجراءات التركيبة والشروحية الخاصة ب DataSet.بالإضافة إلى ذلك، يظهر كيف يسمح الطبيعة الثلاثية ل Rushifteval لتتبع مسارات DIAChronic محددة: تم تغييرها في فترة زمنية معينة ومستقرة بعد ذلك "أو كانت تتغير طوال الفترات الزمنية.استنادا إلى تحليل التقديمات إلى المهمة المشتركة الأخيرة بشأن اكتشاف التغيير الدلالي الروسي، فإننا نجيد أن تحديد هذه المسارات بشكل صحيح يمكن أن تكون مهمة فرعية مثيرة للاهتمام نفسها.
We present a manually annotated lexical semantic change dataset for Russian: RuShiftEval. Its novelty is ensured by a single set of target words annotated for their diachronic semantic shifts across three time periods, while the previous work either used only two time periods, or different sets of target words. The paper describes the composition and annotation procedure for the dataset. In addition, it is shown how the ternary nature of RuShiftEval allows to trace specific diachronic trajectories: changed at a particular time period and stable afterwards' or was changing throughout all time periods'. Based on the analysis of the submissions to the recent shared task on semantic change detection for Russian, we argue that correctly identifying such trajectories can be an interesting sub-task itself.
References used
Multilingual and cross-lingual Semantic Role Labeling (SRL) have recently garnered increasing attention as multilingual text representation techniques have become more effective and widely available. While recent work has attained growing success, re
In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its i
Many applications require generation of summaries tailored to the user's information needs, i.e., their intent. Methods that express intent via explicit user queries fall short when query interpretation is subjective. Several datasets exist for summa
Precisely defining the terminology is the first step in scientific communication. Developing neural text generation models for definition generation can circumvent the labor-intensity curation, further accelerating scientific discovery. Unfortunately
In Romanian language there are some resources for automatic text comprehension, but for Emotion Detection, not lexicon-based, there are none. To cover this gap, we extracted data from Twitter and created the first dataset containing tweets annotated