غير قادر على أهمية فهم الإعلان والكوميديا وسياسة الكلب الصافرة.ومع ذلك، يتم إعاقة البحوث الحسابية على غير قادر على عدم وجود مجموعات البيانات المتاحة.في هذه الورقة، نقترح مجموعة بيانات صينية كبيرة ومتنوعة لإنشاء وفهم غير قادر على منظور اللغويات الحسابية.نحن صياغة مهمة لا يمكن فهمها وتوفير كل من التحليل الكمي والنوعي لكل من كلمة اختبار تضمين التشابه واللغة المحددة مسبقا.تشير التجارب إلى أن هذه المهمة تتطلب فهم اللغة العميقة والضمان السليم والمعرفة العالمية وبالتالي يمكن أن يكون اختبارا جيدا من أجل نماذج اللغة المحددة مسبقا ونماذج المساعدة تؤدي بشكل أفضل على المهام الأخرى.
Cant is important for understanding advertising, comedies and dog-whistle politics. However, computational research on cant is hindered by a lack of available datasets. In this paper, we propose a large and diverse Chinese dataset for creating and understanding cant from a computational linguistics perspective. We formulate a task for cant understanding and provide both quantitative and qualitative analysis for tested word embedding similarity and pretrained language models. Experiments suggest that such a task requires deep language understanding, common sense, and world knowledge and thus can be a good testbed for pretrained language models and help models perform better on other tasks.
References used
https://aclanthology.org/
Dialogue summarization has drawn much attention recently. Especially in the customer service domain, agents could use dialogue summaries to help boost their works by quickly knowing customer's issues and service progress. These applications require s
Sentiment analysis has attracted increasing attention in e-commerce. The sentiment polarities underlying user reviews are of great value for business intelligence. Aspect category sentiment analysis (ACSA) and review rating prediction (RP) are two es
We annotate 17,000 SNS posts with both the writer's subjective emotional intensity and the reader's objective one to construct a Japanese emotion analysis dataset. In this study, we explore the difference between the emotional intensity of the writer
This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions. The dataset consists of 200 videos, 5,554 action l
In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its i