MEDIC: A Multi-Task Learning Dataset for Disaster Image Classification

197 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Firoj Alam

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Firoj Alam - Tanvirul Alam - Md. Arid Hasan

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recent research in disaster informatics demonstrates a practical and important use case of artificial intelligence to save human lives and sufferings during post-natural disasters based on social media contents (text and images). While notable progress has been made using texts, research on exploiting the images remains relatively under-explored. To advance the image-based approach, we propose MEDIC (available at: https://crisisnlp.qcri.org/medic/index.html), which is the largest social media image classification dataset for humanitarian response consisting of 71,198 images to address four different tasks in a multi-task learning setup. This is the first dataset of its kind: social media image, disaster response, and multi-task learning research. An important property of this dataset is its high potential to contribute research on multi-task learning, which recently receives much interest from the machine learning community and has shown remarkable results in terms of memory, inference speed, performance, and generalization capability. Therefore, the proposed dataset is an important resource for advancing image-based disaster management and multi-task machine learning research.

قيم البحث

358 - Firoj Alam , Tanvirul Alam , Muhammad Imran 2021

Images shared on social media help crisis managers gain situational awareness and assess incurred damages, among other response tasks. As the volume and velocity of such content are typically high, real-time image classification has become an urgent need for a faster disaster response. Recent advances in computer vision and deep neural networks have enabled the development of models for real-time image classification for a number of tasks, including detecting crisis incidents, filtering irrelevant images, classifying images into specific humanitarian categories, and assessing the severity of the damage. To develop robust real-time models, it is necessary to understand the capability of the publicly available pre-trained models for these tasks, which remains to be under-explored in the crisis informatics literature. In this study, we address such limitations by investigating ten different network architectures for four different tasks using the largest publicly available datasets for these tasks. We also explore various data augmentation strategies, semi-supervised techniques, and a multitask learning setup. In our extensive experiments, we achieve promising results.

الرؤية الحاسوبية وتمييز الأنماط أجهزة الكمبيوتر والمجتمع التعلم الآلي

Analysis of Social Media Data using Multimodal Deep Learning for Disaster Response

80 - Ferda Ofli , Firoj Alam , Muhammad Imran 2020

Multimedia content in social media platforms provides significant information during disaster events. The types of information shared include reports of injured or deceased people, infrastructure damage, and missing or found people, among others. Alt hough many studies have shown the usefulness of both text and image content for disaster response purposes, the research has been mostly focused on analyzing only the text modality in the past. In this paper, we propose to use both text and image modalities of social media data to learn a joint representation using state-of-the-art deep learning techniques. Specifically, we utilize convolutional neural networks to define a multimodal deep learning architecture with a modality-agnostic shared representation. Extensive experiments on real-world disaster datasets show that the proposed multimodal architecture yields better performance than models trained using a single modality (e.g., either text or image).

الرؤية الحاسوبية وتمييز الأنماط أجهزة الكمبيوتر والمجتمع التعلم الآلي

Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response

316 - Firoj Alam , Ferda Ofli , Muhammad Imran 2020

During a disaster event, images shared on social media helps crisis managers gain situational awareness and assess incurred damages, among other response tasks. Recent advances in computer vision and deep neural networks have enabled the development of models for real-time image classification for a number of tasks, including detecting crisis incidents, filtering irrelevant images, classifying images into specific humanitarian categories, and assessing the severity of damage. Despite several efforts, past works mainly suffer from limited resources (i.e., labeled images) available to train more robust deep learning models. In this study, we propose new datasets for disaster type detection, and informativeness classification, and damage severity assessment. Moreover, we relabel existing publicly available datasets for new tasks. We identify exact- and near-duplicates to form non-overlapping data splits, and finally consolidate them to create larger datasets. In our extensive experiments, we benchmark several state-of-the-art deep learning models and achieve promising results. We release our datasets and models publicly, aiming to provide proper baselines as well as to spur further research in the crisis informatics community.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Large Margin Multi-modal Multi-task Feature Extraction for Image Classification

70 - Yong Luo , Yonggang Wen , Dacheng Tao 2019

The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, whi ch often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We therefore propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization and each sub-problem can be efficiently solved. Experiments on two challenging real-world image datasets demonstrate the effectiveness and superiority of the proposed method.

التعلم الالي الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Hand Image Understanding via Deep Multi-Task Learning

187 - Xiong Zhang , Hongsheng Huang , Jianchao Tan 2021

Analyzing and understanding hand information from multimedia materials like images or videos is important for many real world applications and remains active in research community. There are various works focusing on recovering hand information from single image, however, they usually solve a single task, for example, hand mask segmentation, 2D/3D hand pose estimation, or hand mesh reconstruction and perform not well in challenging scenarios. To further improve the performance of these tasks, we propose a novel Hand Image Understanding (HIU) framework to extract comprehensive information of the hand object from a single RGB image, by jointly considering the relationships between these tasks. To achieve this goal, a cascaded multi-task learning (MTL) backbone is designed to estimate the 2D heat maps, to learn the segmentation mask, and to generate the intermediate 3D information encoding, followed by a coarse-to-fine learning paradigm and a self-supervised learning strategy. Qualitative experiments demonstrate that our approach is capable of recovering reasonable mesh representations even in challenging situations. Quantitatively, our method significantly outperforms the state-of-the-art approaches on various widely-used datasets, in terms of diverse evaluation metrics.

الرؤية الحاسوبية وتمييز الأنماط