نقدم دراسة شاملة للسبوريا المتاحة للحوار متعدد الأحزاب.نقوم بإجراء أكثر من 300 منشور مرتبط بالحوار المتعدد الأحزاب والكتالوج كافة شركة متاحة في التصنيف الجديد.نقوم بتحليل أساليب جمع البيانات لشركة حوار متعددة الأحزاب والحساب وتحديد العديد من المضادات في نهج جمع البيانات الحالية المستخدمة لجمع هذا الحوار.نقدم هذا الاستطلاع، والمسح الأول يركز حصريا على برج الحوار متعدد الأحزاب، لتحفيز البحث في هذا المجال.من خلال مناقشتنا بطرق جمع البيانات الحالية، نحدد Desiderata والمبادئ التوجيهية لمجموعة بيانات متعددة الأحزاب للمساهمة بزيادة تعزيز هذا المجال بحوث الحوار.
We present a comprehensive survey of available corpora for multi-party dialogue. We survey over 300 publications related to multi-party dialogue and catalogue all available corpora in a novel taxonomy. We analyze methods of data collection for multi-party dialogue corpora and identify several lacunae in existing data collection approaches used to collect such dialogue. We present this survey, the first survey to focus exclusively on multi-party dialogue corpora, to motivate research in this area. Through our discussion of existing data collection methods, we identify desiderata and guiding principles for multi-party data collection to contribute further towards advancing this area of dialogue research.
References used
https://aclanthology.org/
Many crowdsourced NLP datasets contain systematic artifacts that are identified only after data collection is complete. Earlier identification of these issues should make it easier to create high-quality training and evaluation data. We attempt this
Human-assisting systems such as dialogue systems must take thoughtful, appropriate actions not only for clear and unambiguous user requests, but also for ambiguous user requests, even if the users themselves are not aware of their potential requireme
Concept normalization of clinical texts to standard medical classifications and ontologies is a task with high importance for healthcare and medical research. We attempt to solve this problem through automatic SNOMED CT encoding, where SNOMED CT is o
Crowdsourcing from non-experts is one of the most common approaches to collecting data and annotations in NLP. Even though it is such a fundamental tool in NLP, crowdsourcing use is largely guided by common practices and the personal experience of re
Multi-party dialogue machine reading comprehension (MRC) brings tremendous challenge since it involves multiple speakers at one dialogue, resulting in intricate speaker information flows and noisy dialogue contexts. To alleviate such difficulties, pr