الملخص بينما طردت نماذج اللغة المحددة (LMS) مكاسب مثيرة للإعجاب على المهام المورفو والدلية، وقدرتها على نموذج الخطاب والظواهر البراغماتية أقل وضوحا.كخطوة نحو فهم أفضل لقدرات النمذجة خطابها، نقترح مهمة كشف التسلل.ندرس أداء مجموعة واسعة من LMS المحدد مسبقا على مهمة الكشف هذه للغة الإنجليزية.تفتقر إلى مجموعة بيانات للمهمة، ونحن نقدم DataSet Inteded Inteded، وهي عبارة عن بيانات الكشف عن عقوبة الدخيل الرواية، والتي تحتوي على 170،000+ مستندات مصنوعة من مقالات أخبار Wikipedia و CNN الإنجليزية.تظهر تجاربنا أن LMS المسبدة مسبقا تؤدي بشكل مثير للإعجاب في التقييم داخل المجال، بل تواجه انخفاضا كبيرا في إعداد المجال المتبادل، مما يشير إلى قدرة تعميم محدودة.نتائج أخرى على مجموعة بيانات مسبار لغوية جديدة تظهر أن هناك مجالا كبيرا للتحسين، خاصة في إعداد المجال المتقاطع.
Abstract While pretrained language models (LMs) have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear. As a step towards a better understanding of their discourse modeling capabilities, we propose a sentence intrusion detection task. We examine the performance of a broad range of pretrained LMs on this detection task for English. Lacking a dataset for the task, we introduce INSteD, a novel intruder sentence detection dataset, containing 170,000+ documents constructed from English Wikipedia and CNN news articles. Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting, indicating limited generalization capacity. Further results over a novel linguistic probe dataset show that there is substantial room for improvement, especially in the cross- domain setting.
References used
https://aclanthology.org/
The task of Event Detection (ED) in Information Extraction aims to recognize and classify trigger words of events in text. The recent progress has featured advanced transformer-based language models (e.g., BERT) as a critical component in state-of-th
Natural language generation (NLG) tasks on pro-drop languages are known to suffer from zero pronoun (ZP) problems, and the problems remain challenging due to the scarcity of ZP-annotated NLG corpora. In this case, we propose a highly adaptive two-sta
Developers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations. However, image captioning metrics have struggled to give accurate learned estimates of the semantic and pragmatic succe
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s). While such content may appear at the beginning of a single document, essential information is frequently reiterated in a
The OCT is a recent diagnostic tool for non-invasive tissue examination, which has
been used in clinical practice since 1995. The OCT is similar to the ultrasonic tomography
in that it relies on optical waves rather than ultrasound ones. The abilit