New community

Subscribe to the gold package and get unlimited access to Shamra Academy

``Don't discuss'': Investigating Semantic and Argumentative Features for Supervised Propagandist Message Detection and Classification

"لا تناقش": التحقيق في ميزات الدلالية والجدبية للكشف عن الرسائل والتصنيف والإشراف

155 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

investigating semantic argumentative features propagandist message detection التحقيق الدلالي ميزات جدلية كشف الرسائل الدعائية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

One of the mechanisms through which disinformation is spreading online, in particular through social media, is by employing propaganda techniques. These include specific rhetorical and psychological strategies, ranging from leveraging on emotions to exploiting logical fallacies. In this paper, our goal is to push forward research on propaganda detection based on text analysis, given the crucial role these methods may play to address this main societal issue. More precisely, we propose a supervised approach to classify textual snippets both as propaganda messages and according to the precise applied propaganda technique, as well as a detailed linguistic analysis of the features characterising propaganda information in text (e.g., semantic, sentiment and argumentation features). Extensive experiments conducted on two available propagandist resources (i.e., NLP4IF'19 and SemEval'20-Task 11 datasets) show that the proposed approach, leveraging different language models and the investigated linguistic features, achieves very promising results on propaganda classification, both at sentence- and at fragment-level.

References used

https://aclanthology.org/

rate research

Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection

535 - Association for Computation Linguistics 2021 مقالة

In this paper, we describe experiments designed to evaluate the impact of stylometric and emotion-based features on hate speech detection: the task of classifying textual content into hate or non-hate speech classes. Our experiments are conducted for three languages -- English, Slovene, and Dutch -- both in in-domain and cross-domain setups, and aim to investigate hate speech using features that model two linguistic phenomena: the writing style of hateful social media content operationalized as function word usage on the one hand, and emotion expression in hateful messages on the other hand. The results of experiments with features that model different combinations of these phenomena support our hypothesis that stylometric and emotion-based features are robust indicators of hate speech. Their contribution remains persistent with respect to domain and language variation. We show that the combination of features that model the targeted phenomena outperforms words and character n-gram features under cross-domain conditions, and provides a significant boost to deep learning models, which currently obtain the best results, when combined with them in an ensemble.

أثارها الحبيبات الجميلة multilingual cross-domain hate speech detection متعدد اللغات عبر المجال الكراهية اكتشاف الكلام صناعة حمض الفوسفور

Sarcasm and Sentiment Detection in Arabic: investigating the interest of character-level features

428 - Association for Computation Linguistics 2021 مقالة

We present three methods developed for the Shared Task on Sarcasm and Sentiment Detection in Arabic. We present a baseline that uses character n-gram features. We also propose two more sophisticated methods: a recurrent neural network with a word lev el representation and an ensemble classifier relying on word and character-level features. We chose to present results from an ensemble classifier but it was not very successful as compared to the best systems : 22th/37 on sarcasm detection and 15th/22 on sentiment detection. It finally appeared that our baseline could have been improved and beat those results.

كلمة satic word embeddings. detection in arabic investigating the interest الكشف باللغة العربية التحقيق في الفائدة صناعة حمض الفوسفور

Don't Discard All the Biased Instances: Investigating a Core Assumption in Dataset Bias Mitigation Techniques

196 - Association for Computation Linguistics 2021 مقالة

Existing techniques for mitigating dataset bias often leverage a biased model to identify biased instances. The role of these biased instances is then reduced during the training of the main model to enhance its robustness to out-of-distribution data . A common core assumption of these techniques is that the main model handles biased instances similarly to the biased model, in that it will resort to biases whenever available. In this paper, we show that this assumption does not hold in general. We carry out a critical investigation on two well-known datasets in the domain, MNLI and FEVER, along with two biased instance detection methods, partial-input and limited-capacity models. Our experiments show that in around a third to a half of instances, the biased model is unable to predict the main model's behavior, highlighted by the significantly different parts of the input on which they base their decisions. Based on a manual validation, we also show that this estimate is highly in line with human interpretation. Our findings suggest that down-weighting of instances detected by bias detection methods, which is a widely-practiced procedure, is an unnecessary waste of training data. We release our code to facilitate reproducibility and future research.

dataset bias mitigation bias mitigation techniques DataSet Bias التخفيف تقنيات التخفيف من التحيز صناعة حمض الفوسفور

Probing Pre-trained Language Models for Semantic Attributes and their Values

319 - Association for Computation Linguistics 2021 مقالة

Pretrained language models (PTLMs) yield state-of-the-art performance on many natural language processing tasks, including syntax, semantics and commonsense. In this paper, we focus on identifying to what extent do PTLMs capture semantic attributes a nd their values, e.g., the correlation between rich and high net worth. We use PTLMs to predict masked tokens using patterns and lists of items from Wikidata in order to verify how likely PTLMs encode semantic attributes along with their values. Such inferences based on semantics are intuitive for humans as part of our language understanding. Since PTLMs are trained on large amount of Wikipedia data we would assume that they can generate similar predictions, yet our findings reveal that PTLMs are still much worse than humans on this task. We show evidence and analysis explaining how to exploit our methodology to integrate better context and semantics into PTLMs using knowledge bases.

probing pre-trained language probing pre-trained التحقيق اللغة المدربة مسبقا التحقيق مسبقا المدربين صناعة حمض الفوسفور

TECHSSN at SemEval-2021 Task 7: Humor and Offense detection and classification using ColBERT embeddings

339 - Association for Computation Linguistics 2021 مقالة

This paper describes the system used for detecting humor in text. The system developed by the team TECHSSN uses binary classification techniques to classify the text. The data undergoes preprocessing and is given to ColBERT (Contextualized Late Inter action over BERT), a modification of Bidirectional Encoder Representations from Transformers (BERT). The model is re-trained and the weights are learned for the dataset. This system was developed for the task 7 of the competition, SemEval 2021.

offense detection contextualized late interaction الكشف عن الجريمة التفاعل المتأخر السياقي صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

``Don't discuss'': Investigating Semantic and Argumentative Features for Supervised Propagandist Message Detection and Classification

"لا تناقش": التحقيق في ميزات الدلالية والجدبية للكشف عن الرسائل والتصنيف والإشراف

Ask ChatGPT about the research

Read More

suggested questions