Do you want to publish a course? Click here

NLPIITR at SemEval-2021 Task 6: RoBERTa Model with Data Augmentation for Persuasion Techniques Detection

NLPIITR في مهمة Semeval-2021 6: نموذج روبرتا مع تكبير البيانات للكشف عن تقنيات الإقناع

285   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

This paper describes and examines different systems to address Task 6 of SemEval-2021: Detection of Persuasion Techniques In Texts And Images, Subtask 1. The task aims to build a model for identifying rhetorical and psycho- logical techniques (such as causal oversimplification, name-calling, smear) in the textual content of a meme which is often used in a disinformation campaign to influence the users. The paper provides an extensive comparison among various machine learning systems as a solution to the task. We elaborate on the pre-processing of the text data in favor of the task and present ways to overcome the class imbalance. The results show that fine-tuning a RoBERTa model gave the best results with an F1-Micro score of 0.51 on the development set.



References used
https://aclanthology.org/
rate research

Read More

We describe SemEval-2021 task 6 on Detection of Persuasion Techniques in Texts and Images: the data, the annotation guidelines, the evaluation setup, the results, and the participating systems. The task focused on memes and had three subtasks: (i) de tecting the techniques in the text, (ii) detecting the text spans where the techniques are used, and (iii) detecting techniques in the entire meme, i.e., both in the text and in the image. It was a popular task, attracting 71 registrations, and 22 teams that eventually made an official submission on the test set. The evaluation results for the third subtask confirmed the importance of both modalities, the text and the image. Moreover, some teams reported benefits when not just combining the two modalities, e.g., by using early or late fusion, but rather modeling the interaction between them in a joint model.
The following system description presents our approach to the detection of persuasion techniques in texts and images. The given task has been framed as a multi-label classification problem with the different techniques serving as class labels. The mu lti-label classification problem is one in which a list of target variables such as our class labels is associated with every input chunk and assumes that a document can simultaneously and independently be assigned to multiple labels or classes. In order to assign class labels to the given memes, we opted for RoBERTa (A Robustly Optimized BERT Pretraining Approach) as a neural network architecture for token and sequence classification. Starting off with a pre-trained model for language representation we fine-tuned this model on the given classification task with the provided annotated data in supervised training steps. To incorporate image features in the multi-modal setting, we rely on the pre-trained VGG-16 model architecture.
The objective of subtask 2 of SemEval-2021 Task 6 is to identify techniques used together with the span(s) of text covered by each technique. This paper describes the system and model we developed for the task. We first propose a pipeline system to i dentify spans, then to classify the technique in the input sequence. But it severely suffers from handling the overlapping in nested span. Then we propose to formulize the task as a question answering task by MRC framework which achieves a better result compared to the pipeline method. Moreover, data augmentation and loss design techniques are also explored to alleviate the problem of data sparse and imbalance. Finally, we attain the 3rd place in the final evaluation phase.
We developed a system for task 6 sub-task 1 for detecting propaganda in memes. An external dataset and augmentation data-set were used to extend the official competition data-set. Data augmentation techniques were applied on the external data-set and competition data-set to come up with the augmented data-set. We trained 5 transformers (DeBERTa, and 4 RoBERTa) and ensembled them to make the prediction. We trained 1 RoBERTa model initially on the augmented data-set for a few epochs and then fine-tuned it on the competition data-set which improved the f1-micro up to 0.1 scores. After that, another initial RoBERTa model was trained on the external data-set merged with the augmented data-set for few epochs and fine-tuned it on the competition data-set. Furthermore, we ensembled the initial models with the models after fine-tuning. For the final model in the ensemble, we trained a DeBERTa model on the augmented data-set without fine-tuning it on the competition data-set. Finally, we averaged the output of each model in the ensemble to make the prediction.
We describe our approach for SemEval-2021 task 6 on detection of persuasion techniques in multimodal content (memes). Our system combines pretrained multimodal models (CLIP) and chained classifiers. Also, we propose to enrich the data by a data augmentation technique. Our submission achieves a rank of 8/16 in terms of F1-micro and 9/16 with F1-macro on the test set.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا