Why do These Match? Explaining the Behavior of Image Similarity Models

250 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Bryan Plummer

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Bryan A. Plummer - Mariya I. Vasileva - Vitali Petsiuk

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Explaining a deep learning model can help users understand its behavior and allow researchers to discern its shortcomings. Recent work has primarily focused on explaining models for tasks like image classification or visual question answering. In this paper, we introduce Salient Attributes for Network Explanation (SANE) to explain image similarity models, where a models output is a score measuring the similarity of two inputs rather than a classification score. In this task, an explanation depends on both of the input images, so standard methods do not apply. Our SANE explanations pairs a saliency map identifying important image regions with an attribute that best explains the match. We find that our explanations provide additional information not typically captured by saliency maps alone, and can also improve performance on the classic task of attribute recognition. Our approachs ability to generalize is demonstrated on two datasets from diverse domains, Polyvore Outfits and Animals with Attributes 2. Code available at: https://github.com/VisionLearningGroup/SANE

قيم البحث

66 - Han S. Lee , Alex A. Agarwal , Junmo Kim 2017

In a recent decade, ImageNet has become the most notable and powerful benchmark database in computer vision and machine learning community. As ImageNet has emerged as a representative benchmark for evaluating the performance of novel deep learning mo dels, its evaluation tends to include only quantitative measures such as error rate, rather than qualitative analysis. Thus, there are few studies that analyze the failure cases of deep learning models in ImageNet, though there are numerous works analyzing the networks themselves and visualizing them. In this abstract, we qualitatively analyze the failure cases of ImageNet classification results from recent deep learning model, and categorize these cases according to the certain image patterns. Through this failure analysis, we believe that it can be discovered what the final challenges are in ImageNet database, which the current deep learning model is still vulnerable to.

الرؤية الحاسوبية وتمييز الأنماط

Teachers Do More Than Teach: Compressing Image-to-Image Models

100 - Qing Jin , Jian Ren , Oliver J. Woodford 2021

Generative Adversarial Networks (GANs) have achieved huge success in generating high-fidelity images, however, they suffer from low efficiency due to tremendous computational cost and bulky memory usage. Recent efforts on compression GANs show notice able progress in obtaining smaller generators by sacrificing image quality or involving a time-consuming searching process. In this work, we aim to address these issues by introducing a teacher network that provides a search space in which efficient network architectures can be found, in addition to performing knowledge distillation. First, we revisit the search space of generative models, introducing an inception-based residual block into generators. Second, to achieve target computation cost, we propose a one-step pruning algorithm that searches a student architecture from the teacher model and substantially reduces searching cost. It requires no l1 sparsity regularization and its associated hyper-parameters, simplifying the training procedure. Finally, we propose to distill knowledge through maximizing feature similarity between teacher and student via an index named Global Kernel Alignment (GKA). Our compressed networks achieve similar or even better image fidelity (FID, mIoU) than the original models with much-reduced computational cost, e.g., MACs. Code will be released at https://github.com/snap-research/CAT.

الرؤية الحاسوبية وتمييز الأنماط

Why Do Line Drawings Work? A Realism Hypothesis

74 - Aaron Hertzmann 2020

Why is it that we can recognize object identity and 3D shape from line drawings, even though they do not exist in the natural world? This paper hypothesizes that the human visual system perceives line drawings as if they were approximately realistic images. Moreover, the techniques of line drawing are chosen to accurately convey shape to a human observer. Several implications and variants of this hypothesis are explored.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي

Why Do Neural Response Generation Models Prefer Universal Replies?

438 - Bowen Wu , Nan Jiang , Zhifeng Gao 2018

Recent advances in sequence-to-sequence learning reveal a purely data-driven approach to the response generation task. Despite its diverse applications, existing neural models are prone to producing short and generic replies, making it infeasible to tackle open-domain challenges. In this research, we analyze this critical issue in light of the models optimization goal and the specific characteristics of the human-to-human dialog corpus. By decomposing the black box into parts, a detailed analysis of the probability limit was conducted to reveal the reason behind these universal replies. Based on these analyses, we propose a max-margin ranking regularization term to avoid the models leaning to these replies. Finally, empirical experiments on case studies and benchmarks with several metrics validate this approach.

الحساب واللغة

The 2021 Image Similarity Dataset and Challenge

184 - Matthijs Douze , Giorgos Tolias , Ed Pizzi 2021

This paper introduces a new benchmark for large-scale image similarity detection. This benchmark is used for the Image Similarity Challenge at NeurIPS21 (ISC2021). The goal is to determine whether a query image is a modified copy of any image in a re ference corpus of size 1~million. The benchmark features a variety of image transformations such as automated transformations, hand-crafted image edits and machine-learning based manipulations. This mimics real-life cases appearing in social media, for example for integrity-related problems dealing with misinformation and objectionable content. The strength of the image manipulations, and therefore the difficulty of the benchmark, is calibrated according to the performance of a set of baseline approaches. Both the query and reference set contain a majority of distractor images that do not match, which corresponds to a real-life needle-in-haystack setting, and the evaluation metric reflects that. We expect the DISC21 benchmark to promote image copy detection as an important and challenging computer vision task and refresh the state of the art.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد العالي للعلوم التطبيقية والتكنولوجيا

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Why do These Match? Explaining the Behavior of Image Similarity Models

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً