New community

Subscribe to the gold package and get unlimited access to Shamra Academy

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media

newsclippings: الجيل التلقائي لوسائط الوسائط المتعددة خارج السياق

245 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

automatic generation multimodal media automatic الجيل التلقائي وسائل الإعلام متعددة الوسائط التلقائي صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Online misinformation is a prevalent societal issue, with adversaries relying on tools ranging from cheap fakes to sophisticated deep fakes. We are motivated by the threat scenario where an image is used out of context to support a certain narrative. While some prior datasets for detecting image-text inconsistency generate samples via text manipulation, we propose a dataset where both image and text are unmanipulated but mismatched. We introduce several strategies for automatically retrieving convincing images for a given caption, capturing cases with inconsistent entities or semantic context. Our large-scale automatically generated the NewsCLIPpings Dataset: (1) demonstrates that machine-driven image repurposing is now a realistic threat, and (2) provides samples that represent challenging instances of mismatch between text and image in news that are able to mislead humans. We benchmark several state-of-the-art multimodal models on our dataset and analyze their performance across different pretraining domains and visual backbones.

References used

https://aclanthology.org/

rate research

Cross-lingual Cross-modal Pretraining for Multimodal Retrieval

383 - Association for Computation Linguistics 2021 مقالة

Recent pretrained vision-language models have achieved impressive performance on cross-modal retrieval tasks in English. Their success, however, heavily depends on the availability of many annotated image-caption datasets for pretraining, where the t exts are not necessarily in English. Although we can utilize machine translation (MT) tools to translate non-English text to English, the performance still largely relies on MT's quality and may suffer from high latency problems in real-world applications. This paper proposes a new approach to learn cross-lingual cross-modal representations for matching images and their relevant captions in multiple languages. We seamlessly combine cross-lingual pretraining objectives and cross-modal pretraining objectives in a unified framework to learn image and text in a joint embedding space from available English image-caption data, monolingual and parallel corpus. We show that our approach achieves SOTA performance in retrieval tasks on two multimodal multilingual image caption benchmarks: Multi30k with German captions and MSCOCO with Japanese captions.

أهداف المحاذاة الصريحة cross-modal pretraining cross-modal retrieval tasks عقدة عبر الوسائط مهام استرجاع مشروط صناعة حمض الفوسفور

Experiences of Adapting Multimodal Machine Translation Techniques for Hindi

347 - Association for Computation Linguistics 2021 مقالة

Multimodal Neural Machine Translation (MNMT) is an interesting task in natural language processing (NLP) where we use visual modalities along with a source sentence to aid the source to target translation process. Recently, there has been a lot of wo rks in MNMT frameworks to boost the performance of standalone Machine Translation tasks. Most of the prior works in MNMT tried to perform translation between two widely known languages (e.g. English-to-German, English-to-French ). In this paper, We explore the effectiveness of different state-of-the-art MNMT methods, which use various data oriented techniques including multimodal pre-training, for low resource languages. Although the existing methods works well on high resource languages, usability of those methods on low-resource languages is unknown. In this paper, we evaluate the existing methods on Hindi and report our findings.

adapting multimodal machine experiences of adapting تكييف آلة متعددة الوسائط تجارب التكيف صناعة حمض الفوسفور

Automatic generation of a 3D sign language avatar on AR glasses given 2D videos of human signers

202 - Association for Computation Linguistics 2021 مقالة

In this paper we present a prototypical implementation of a pipeline that allows the automatic generation of a German Sign Language avatar from 2D video material. The presentation is accompanied by the source code. We record human pose movements duri ng signing with computer vision models. The joint coordinates of hands and arms are imported as landmarks to control the skeleton of our avatar. From the anatomically independent landmarks, we create another skeleton based on the avatar's skeletal bone architecture to calculate the bone rotation data. This data is then used to control our human 3D avatar. The avatar is displayed on AR glasses and can be placed virtually in the room, in a way that it can be perceived simultaneously to the verbal speaker. In further work it is aimed to be enhanced with speech recognition and machine translation methods for serving as a sign language interpreter. The prototype has been shown to people of the deaf and hard-of-hearing community for assessing its comprehensibility. Problems emerged with the transferred hand rotations, hand gestures were hard to recognize on the avatar due to deformations like twisted finger meshes.

sign language avatar لغة الإشارة الرمزية لغة صناعة حمض الفوسفور

Automatic Story Generation: Challenges and Attempts

326 - Association for Computation Linguistics 2021 مقالة

Automated storytelling has long captured the attention of researchers for the ubiquity of narratives in everyday life. The best human-crafted stories exhibit coherent plot, strong characters, and adherence to genres, attributes that current states-of -the-art still struggle to produce, even using transformer architectures. In this paper, we analyze works in story generation that utilize machine learning approaches to (1) address story generation controllability, (2) incorporate commonsense knowledge, (3) infer reasonable character actions, and (4) generate creative language.

challenges and attempts automatic story generation story generation التحديات والمحاولات توليد القصة التلقائي جيل القصة صناعة حمض الفوسفور المزيد..

SRPOL DIALOGUE SYSTEMS at SemEval-2021 Task 5: Automatic Generation of Training Data for Toxic Spans Detection

354 - Association for Computation Linguistics 2021 مقالة

This paper presents a system used for SemEval-2021 Task 5: Toxic Spans Detection. Our system is an ensemble of BERT-based models for binary word classification, trained on a dataset extended by toxic comments modified and generated by two language mo dels. For the toxic word classification, the prediction threshold value was optimized separately for every comment, in order to maximize the expected F1 value.

srpol dialogue systems srpol dialogue نظم حوار SRPOL حوار SRPOL. صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media

newsclippings: الجيل التلقائي لوسائط الوسائط المتعددة خارج السياق

Ask ChatGPT about the research

Read More

suggested questions