في هذه الورقة، نقترح مجموعة بيانات تحليل المعنويات المشروح مصنوعة من نصوص البنغالية المكتوبة بشكل غير رسمي.تضم هذه البيانات هذه البيانات تعليقات عامة على الأخبار ومقاطع الفيديو التي تم جمعها من وسائل التواصل الاجتماعي تغطي 13 مجالات مختلفة، بما في ذلك السياسة والتعليم والزراعة.يتم تصنيف هذه التعليقات بأحد الملصقات القطبية، وهي إيجابية، سلبية، محايدة.واحدة من الخصائص المهمة من مجموعة البيانات هي أن كل من التعليقات صاخبة من حيث مزيج من اللهجات والتصميم النحوي.تظهر تجاربنا لتطوير نظام تصنيف معيار أن الميزات المعجمية المصنوعة يدويا توفر أداء فائقا من الشبكة العصبية ونماذج اللغة المحددة مسبقا.لقد جعلنا مجموعة البيانات والرسوم المصاحبة المقدمة في هذه الورقة متاحة للجمهور في https://git.io/juunb.
In this paper, we propose an annotated sentiment analysis dataset made of informally written Bangla texts. This dataset comprises public comments on news and videos collected from social media covering 13 different domains, including politics, education, and agriculture. These comments are labeled with one of the polarity labels, namely positive, negative, and neutral. One significant characteristic of the dataset is that each of the comments is noisy in terms of the mix of dialects and grammatical incorrectness. Our experiments to develop a benchmark classification system show that hand-crafted lexical features provide superior performance than neural network and pretrained language models. We have made the dataset and accompanying models presented in this paper publicly available at https://git.io/JuuNB.
References used
https://aclanthology.org/
On various Social Media platforms, people, tend to use the informal way to communicate, or write posts and comments: their local dialects. In Africa, more than 1500 dialects and languages exist. Particularly, Tunisians talk and write informally using
Sentiment analysis has attracted increasing attention in e-commerce. The sentiment polarities underlying user reviews are of great value for business intelligence. Aspect category sentiment analysis (ACSA) and review rating prediction (RP) are two es
This study introduces and analyzes WikiTalkEdit, a dataset of conversations and edit histories from Wikipedia, for research in online cooperation and conversation modeling. The dataset comprises dialog triplets from the Wikipedia Talk pages, and edit
Abstract Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due
People utilize online forums to either look for information or to contribute it. Because of their growing popularity, certain online forums have been created specifically to provide support, assistance, and opinions for people suffering from mental i