Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

AIMH at SemEval-2021 Task 6: Multimodal Classification Using an Ensemble of Transformer Models

Aimh في مهمة Semeval-2021 6: تصنيف متعدد الوسائط باستخدام مجموعة من نماذج المحولات

673 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

aimh team visual textual transformer double visual textual فريق Aimh محول البصرية النصية ضعف المرئي النصية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper describes the system used by the AIMH Team to approach the SemEval Task 6. We propose an approach that relies on an architecture based on the transformer model to process multimodal content (text and images) in memes. Our architecture, called DVTT (Double Visual Textual Transformer), approaches Subtasks 1 and 3 of Task 6 as multi-label classification problems, where the text and/or images of the meme are processed, and the probabilities of the presence of each possible persuasion technique are returned as a result. DVTT uses two complete networks of transformers that work on text and images that are mutually conditioned. One of the two modalities acts as the main one and the second one intervenes to enrich the first one, thus obtaining two distinct ways of operation. The two transformers outputs are merged by averaging the inferred probabilities for each possible label, and the overall network is trained end-to-end with a binary cross-entropy loss.

References used

https://aclanthology.org/

rate research

Alpha at SemEval-2021 Task 6: Transformer Based Propaganda Classification

659 - Association for Computation Linguistics 2021 مقالة

This paper describes our system participated in Task 6 of SemEval-2021: the task focuses on multimodal propaganda technique classification and it aims to classify given image and text into 22 classes. In this paper, we propose to use transformer base d architecture to fuse the clues from both image and text. We explore two branches of techniques including fine-tuning the text pretrained transformer with extended visual features, and fine-tuning the multimodal pretrained transformers. For the visual features, we have tested both grid features based on ResNet and salient region features from pretrained object detector. Among the pretrained multimodal transformers, we choose ERNIE-ViL, a two-steam cross-attended transformers pretrained on large scale image-caption aligned data. Fine-tuing ERNIE-ViL for our task produce a better performance due to general joint multimodal representation for text and image learned by ERNIE-ViL. Besides, as the distribution of the classification labels is very unbalanced, we also make a further attempt on the loss function and the experiment result shows that focal loss would perform better than cross entropy loss. Last we have won first for subtask C in the final competition.

based propaganda classification transformer based propaganda propaganda technique classification تصنيف الدعاية القائم الدعاية القائمة على المحولات تصنيف تقنية الدعاية صناعة حمض الفوسفور المزيد..

Volta at SemEval-2021 Task 6: Towards Detecting Persuasive Texts and Images using Textual and Multimodal Ensemble

566 - Association for Computation Linguistics 2021 مقالة

Memes are one of the most popular types of content used to spread information online. They can influence a large number of people through rhetorical and psychological techniques. The task, Detection of Persuasion Techniques in Texts and Images, is to detect these persuasive techniques in memes. It consists of three subtasks: (A) Multi-label classification using textual content, (B) Multi-label classification and span identification using textual content, and (C) Multi-label classification using visual and textual content. In this paper, we propose a transfer learning approach to fine-tune BERT-based models in different modalities. We also explore the effectiveness of ensembles of models trained in different modalities. We achieve an F1-score of 57.0, 48.2, and 52.1 in the corresponding subtasks.

detecting persuasive texts detecting persuasive الكشف عن النصوص المقنعة الكشف عن إقناع صناعة حمض الفوسفور

MinD at SemEval-2021 Task 6: Propaganda Detection using Transfer Learning and Multimodal Fusion

631 - Association for Computation Linguistics 2021 مقالة

We describe our systems of subtask1 and subtask3 for SemEval-2021 Task 6 on Detection of Persuasion Techniques in Texts and Images. The purpose of subtask1 is to identify propaganda techniques given textual content, and the goal of subtask3 is to det ect them given both textual and visual content. For subtask1, we investigate transfer learning based on pre-trained language models (PLMs) such as BERT, RoBERTa to solve data sparsity problems. For subtask3, we extract heterogeneous visual representations (i.e., face features, OCR features, and multimodal representations) and explore various multimodal fusion strategies to combine the textual and visual representations. The official evaluation shows our ensemble model ranks 1st for subtask1 and 2nd for subtask3.

الكشف عن إقناع صناعة حمض الفوسفور

WVOQ at SemEval-2021 Task 6: BART for Span Detection and Classification

1132 - Association for Computation Linguistics 2021 مقالة

Simultaneous span detection and classification is a task not currently addressed in standard NLP frameworks. The present paper describes why and how an EncoderDecoder model was used to combine span detection and classification to address subtask 2 of SemEval-2021 Task 6.

span detection detection and classification simultaneous span detection سبان الكشف الكشف والتصنيف اكتشاف سبان في وقت واحد. صناعة حمض الفوسفور المزيد..

CSECU-DSG at SemEval-2021 Task 1: Fusion of Transformer Models for Lexical Complexity Prediction

655 - Association for Computation Linguistics 2021 مقالة

Lexical complexity prediction (LCP) conveys the anticipation of the complexity level of a token or a set of tokens in a sentence. It plays a vital role in the improvement of various NLP tasks including lexical simplification, translations, and text g eneration. However, multiple meaning of a word in multiple circumstances, grammatical complex structure, and the mutual dependency of words in a sentence make it difficult to estimate the lexical complexity. To address these challenges, SemEval-2021 Task 1 introduced a shared task focusing on LCP and this paper presents our participation in this task. We proposed a transformer-based approach with sentence pair regression. We employed two fine-tuned transformer models. Including BERT and RoBERTa to train our model and fuse their predicted score to the complexity estimation. Experimental results demonstrate that our proposed method achieved competitive performance compared to the participants' systems.

ناقلات بيرت صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

AIMH at SemEval-2021 Task 6: Multimodal Classification Using an Ensemble of Transformer Models

Aimh في مهمة Semeval-2021 6: تصنيف متعدد الوسائط باستخدام مجموعة من نماذج المحولات

Ask ChatGPT about the research

Read More

suggested questions