Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

SentNoB: A Dataset for Analysing Sentiment on Noisy Bangla Texts

Sentnob: مجموعة بيانات لتحليل المشاعر على نصوص البنغالية الصاخبة

892 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we propose an annotated sentiment analysis dataset made of informally written Bangla texts. This dataset comprises public comments on news and videos collected from social media covering 13 different domains, including politics, education, and agriculture. These comments are labeled with one of the polarity labels, namely positive, negative, and neutral. One significant characteristic of the dataset is that each of the comments is noisy in terms of the mix of dialects and grammatical incorrectness. Our experiments to develop a benchmark classification system show that hand-crafted lexical features provide superior performance than neural network and pretrained language models. We have made the dataset and accompanying models presented in this paper publicly available at https://git.io/JuuNB.

References used

https://aclanthology.org/

rate research

Introducing A large Tunisian Arabizi Dialectal Dataset for Sentiment Analysis

562 - Association for Computation Linguistics 2021 مقالة

On various Social Media platforms, people, tend to use the informal way to communicate, or write posts and comments: their local dialects. In Africa, more than 1500 dialects and languages exist. Particularly, Tunisians talk and write informally using Latin letters and numbers rather than Arabic ones. In this paper, we introduce a large common-crawl-based Tunisian Arabizi dialectal dataset dedicated for Sentiment Analysis. The dataset consists of a total of 100k comments (about movies, politic, sport, etc.) annotated manually by Tunisian native speakers as Positive, negative and Neutral. We evaluate our dataset on sentiment analysis task using the Bidirectional Encoder Representations from Transformers (BERT) as a contextual language model in its multilingual version (mBERT) as an embedding technique then combining mBERT with Convolutional Neural Network (CNN) as classifier. The dataset is publicly available.

tunisian arabizi dialectal arabizi dialectal dataset الجدلي العربي التونسي عربيز DataSetal. صناعة حمض الفوسفور

ASAP: A Chinese Review Dataset Towards Aspect Category Sentiment Analysis and Rating Prediction

756 - Association for Computation Linguistics 2021 مقالة

Sentiment analysis has attracted increasing attention in e-commerce. The sentiment polarities underlying user reviews are of great value for business intelligence. Aspect category sentiment analysis (ACSA) and review rating prediction (RP) are two es sential tasks to detect the fine-to-coarse sentiment polarities. ACSA and RP are highly correlated and usually employed jointly in real-world e-commerce scenarios. While most public datasets are constructed for ACSA and RP separately, which may limit the further exploitation of both tasks. To address the problem and advance related researches, we present a large-scale Chinese restaurant review dataset ASAP including 46, 730 genuine reviews from a leading online-to-offline (O2O) e-commerce platform in China. Besides a 5-star scale rating, each review is manually annotated according to its sentiment polarities towards 18 pre-defined aspect categories. We hope the release of the dataset could shed some light on the field of sentiment analysis. Moreover, we propose an intuitive yet effective joint model for ACSA and RP. Experimental results demonstrate that the joint model outperforms state-of-the-art baselines on both tasks.

بيرت و GPT. صناعة حمض الفوسفور

WikiTalkEdit: A Dataset for modeling Editors' behaviors on Wikipedia

825 - Association for Computation Linguistics 2021 مقالة

This study introduces and analyzes WikiTalkEdit, a dataset of conversations and edit histories from Wikipedia, for research in online cooperation and conversation modeling. The dataset comprises dialog triplets from the Wikipedia Talk pages, and edit ing actions on the corresponding articles being discussed. We show how the data supports the classic understanding of style matching, where positive emotion and the use of first-person pronouns predict a positive emotional change in a Wikipedia contributor. However, they do not predict editorial behavior. On the other hand, feedback invoking evidentiality and criticism, and references to Wikipedia's community norms, is more likely to persuade the contributor to perform edits but is less likely to lead to a positive emotion. We developed baseline classifiers trained on pre-trained RoBERTa features that can predict editorial change with an F1 score of .54, as compared to an F1 score of .66 for predicting emotional change. A diagnostic analysis of persisting errors is also provided. We conclude with possible applications and recommendations for future work. The dataset is publicly available for the research community at https://github.com/kj2013/WikiTalkEdit/.

modeling editors' behaviors modeling editors' editors' behaviors نماذج سلوكيات المحررين تحرير النمذجة سلوكيات المحررين صناعة حمض الفوسفور المزيد..

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

573 - Association for Computation Linguistics 2021 مقالة

Abstract Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp,1 a large-scale dataset for multi-domain aspect- based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.

aspect-based summarization multi-domain aspect-based summarization abstract aspect-based summarization تلخيص القائم على الجانب تلخيص القائم على الجانب متعدد المجالات التلخيص القائم على الجانب المجردة صناعة حمض الفوسفور المزيد..

A Dataset for Research on Modelling Depression Severity in Online Forum Data

1140 - Association for Computation Linguistics 2021 مقالة

People utilize online forums to either look for information or to contribute it. Because of their growing popularity, certain online forums have been created specifically to provide support, assistance, and opinions for people suffering from mental i llness. Depression is one of the most frequent psychological illnesses worldwide. People communicate more with online forums to find answers for their psychological disease. However, there is no mechanism to measure the severity of depression in each post and give higher importance to those who are diagnosed more severely depressed. Despite the fact that numerous researches based on online forum data and the identification of depression have been conducted, the severity of depression is rarely explored. In addition, the absence of datasets will stymie the development of novel diagnostic procedures for practitioners. From this study, we offer a dataset to support research on depression severity evaluation. The computational approach to measure an automatic process, identified severity of depression here is quite novel approach. Nonetheless, this elaborate measuring severity of depression in online forum posts is needed to ensure the measurement scales used in our research meets the expected norms of scientific research.

modelling depression severity online forum data modelling depression نمذجة الشدة الاكتئاب بيانات المنتدى عبر الإنترنت نمذجة الاكتئاب صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

SentNoB: A Dataset for Analysing Sentiment on Noisy Bangla Texts

Sentnob: مجموعة بيانات لتحليل المشاعر على نصوص البنغالية الصاخبة

Ask ChatGPT about the research

Read More

suggested questions