بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark

155 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhengcong Fei

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhengcong Fei - Zekang Li - Jinchao Zhang

الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

As a kind of new expression elements, Internet memes are popular and extensively used in online chatting scenarios since they manage to make dialogues vivid, moving, and interesting. However, most current dialogue researches focus on text-only dialogue tasks. In this paper, we propose a new task named as textbf{M}eme incorporated textbf{O}pen-domain textbf{D}ialogue (MOD). Compared to previous dialogue tasks, MOD is much more challenging since it requires the model to understand the multimodal elements as well as the emotions behind them. To facilitate the MOD research, we construct a large-scale open-domain multimodal dialogue dataset incorporating abundant Internet memes into utterances. The dataset consists of $sim$45K Chinese conversations with $sim$606K utterances. Each conversation contains about $13$ utterances with about $4$ Internet memes on average and each utterance equipped with an Internet meme is annotated with the corresponding emotion. In addition, we present a simple and effective method, which utilizes a unified generation network to solve the MOD task. Experimental results demonstrate that our method trained on the proposed corpus is able to achieve expressive communication including texts and memes. The corpus and models have been publicly available at https://github.com/lizekang/DSTC10-MOD.

قيم البحث

136 - Lin Su , Nan Duan , Edward Cui 2021

In this paper, we present GEM as a General Evaluation benchmark for Multimodal tasks. Different from existing datasets such as GLUE, SuperGLUE, XGLUE and XTREME that mainly focus on natural language tasks, GEM is a large-scale vision-language benchma rk, which consists of GEM-I for image-language tasks and GEM-V for video-language tasks. Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages. We also provide two baseline models for this benchmark. We will release the dataset, code and baseline models, aiming to advance the development of multilingual multimodal research.

الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

N15News: A New Dataset for Multimodal News Classification

353 - Zhen Wang , Xu Shan , Jie Yang 2021

Current news datasets merely focus on text features on the news and rarely leverage the feature of images, excluding numerous essential features for news classification. In this paper, we propose a new dataset, N15News, which is generated from New Yo rk Times with 15 categories and contains both text and image information in each news. We design a novel multitask multimodal network with different fusion methods, and experiments show multimodal news classification performs better than text-only news classification. Depending on the length of the text, the classification accuracy can be increased by up to 5.8%. Our research reveals the relationship between the performance of a multimodal classifier and its sub-classifiers, and also the possible improvements when applying multimodal in news classification. N15News is shown to have great potential to prompt the multimodal news studies.

الحساب واللغة

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

86 - Faisal Ladhak , Esin Durmus , Claire Cardie 2020

We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems. We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to gui des on a diverse set of topics written by human authors. We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We further propose a method for direct crosslingual summarization (i.e., without requiring translation at inference time) by leveraging synthetic data and Neural Machine Translation as a pre-training step. Our method significantly outperforms the baseline approaches, while being more cost efficient during inference.

الحساب واللغة

ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining

107 - Alexander R. Fabbri , Faiaz Rahman , Imad Rizvi 2021

While online conversations can cover a vast amount of information in many different formats, abstractive text summarization has primarily focused on modeling solely news articles. This research gap is due, in part, to the lack of standardized dataset s for summarizing online discussions. To address this gap, we design annotation protocols motivated by an issues--viewpoints--assertions framework to crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads. We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data. To create a comprehensive benchmark, we also evaluate these models on widely-used conversation summarization datasets to establish strong baselines in this domain. Furthermore, we incorporate argument mining through graph construction to directly model the issues, viewpoints, and assertions present in a conversation and filter noisy input, showing comparable or improved results according to automatic and human evaluations.

الحساب واللغة

A Large-Scale Chinese Short-Text Conversation Dataset

102 - Yida Wang , Pei Ke , Yinhe Zheng 2020

The advancements of neural dialogue generation models show promising results on modeling short-text conversations. However, training such models usually needs a large-scale high-quality dialogue corpus, which is hard to access. In this paper, we pres ent a large-scale cleaned Chinese conversation dataset, LCCC, which contains a base version (6.8million dialogues) and a large version (12.0 million dialogues). The quality of our dataset is ensured by a rigorous data cleaning pipeline, which is built based on a set of rules and a classifier that is trained on manually annotated 110K dialogue pairs. We also release pre-training dialogue models which are trained on LCCC-base and LCCC-large respectively. The cleaned dataset and the pre-training models will facilitate the research of short-text conversation modeling. All the models and datasets are available at https://github.com/thu-coai/CDial-GPT.

الحساب واللغة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حماه

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً