مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods

55 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Erik Rodner

تاريخ النشر 2014

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Alexander Freytag - Johannes Ruhle - Paul Bodesheim

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Vector-quantized local features frequently used in bag-of-visual-words approaches are the backbone of popular visual recognition systems due to both their simplicity and their performance. Despite their success, bag-of-words-histograms basically contain low-level image statistics (e.g., number of edges of different orientations). The question remains how much visual information is lost in quantization when mapping visual features to code words? To answer this question, we present an in-depth analysis of the effect of local feature quantization on human recognition performance. Our analysis is based on recovering the visual information by inverting quantized local features and presenting these visualizations with different codebook sizes to human observers. Although feature inversion techniques are around for quite a while, to the best of our knowledge, our technique is the first visualizing especially the effect of feature quantization. Thereby, we are now able to compare single steps in common image classification pipelines to human counterparts.

قيم البحث

اقرأ أيضاً

Seeing the World in a Bag of Chips

67 - Jeong Joon Park , Aleksander Holynski , Steve Seitz 2020

We address the dual problems of novel view synthesis and environment reconstruction from hand-held RGBD sensors. Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone. In cases where scene surface has a strong mirror-like material component, we generate highly detailed environment images, revealing room composition, objects, people, buildings, and trees visible through windows. Our approach yields state of the art view synthesis techniques, operates on low dynamic range imagery, and is robust to geometric and calibration errors.

الرؤية الحاسوبية وتمييز الأنماط

Towards Generalizable and Robust Face Manipulation Detection via Bag-of-local-feature

91 - Changtao Miao , Qi Chu , Weihai Li 2021

Over the past several years, in order to solve the problem of malicious abuse of facial manipulation technology, face manipulation detection technology has obtained considerable attention and achieved remarkable progress. However, most existing metho ds have very impoverished generalization ability and robustness. In this paper, we propose a novel method for face manipulation detection, which can improve the generalization ability and robustness by bag-of-local-feature. Specifically, we extend Transformers using bag-of-feature approach to encode inter-patch relationships, allowing it to learn local forgery features without any explicit supervision. Extensive experiments demonstrate that our method can outperform competing state-of-the-art methods on FaceForensics++, Celeb-DF and DeeperForensics-1.0 datasets.

الرؤية الحاسوبية وتمييز الأنماط

Towards Task Understanding in Visual Settings

58 - Sebastin Santy , Wazeer Zulfikar , Rishabh Mehrotra 2018

We consider the problem of understanding real world tasks depicted in visual images. While most existing image captioning methods excel in producing natural language descriptions of visual scenes involving human tasks, there is often the need for an understanding of the exact task being undertaken rather than a literal description of the scene. We leverage insights from real world task understanding systems, and propose a framework composed of convolutional neural networks, and an external hierarchical task ontology to produce task descriptions from input images. Detailed experiments highlight the efficacy of the extracted descriptions, which could potentially find their way in many applications, including image alt text generation.

استرجاع المعلومات الرؤية الحاسوبية وتمييز الأنماط

Scalable Visual Attribute Extraction through Hidden Layers of a Residual ConvNet

48 - Andres Baloian , Nils Murrugarra-Llerena , Jose M. Saavedra 2021

Visual attributes play an essential role in real applications based on image retrieval. For instance, the extraction of attributes from images allows an eCommerce search engine to produce retrieval results with higher precision. The traditional manne r to build an attribute extractor is by training a convnet-based classifier with a fixed number of classes. However, this approach does not scale for real applications where the number of attributes changes frequently. Therefore in this work, we propose an approach for extracting visual attributes from images, leveraging the learned capability of the hidden layers of a general convolutional network to discriminate among different visual features. We run experiments with a resnet-50 trained on Imagenet, on which we evaluate the output of its different blocks to discriminate between colors and textures. Our results show that the second block of the resnet is appropriate for discriminating colors, while the fourth block can be used for textures. In both cases, the achieved accuracy of attribute classification is superior to 93%. We also show that the proposed embeddings form local structures in the underlying feature space, which makes it possible to apply reduction techniques like UMAP, maintaining high accuracy and widely reducing the size of the feature space.

الرؤية الحاسوبية وتمييز الأنماط

Shaping Deep Feature Space towards Gaussian Mixture for Visual Classification

399 - Weitao Wan , Jiansheng Chen , Cheng Yu 2020

The softmax cross-entropy loss function has been widely used to train deep models for various tasks. In this work, we propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification. Unlike the softmax cross-entropy l oss, our method explicitly shapes the deep feature space towards a Gaussian Mixture distribution. With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution. The GM loss can be readily used to distinguish abnormal inputs, such as the adversarial examples, based on the discrepancy between feature distributions of the inputs and the training set. Furthermore, theoretical analysis shows that a symmetric feature space can be achieved by using the GM loss, which enables the models to perform robustly against adversarial attacks. The proposed model can be implemented easily and efficiently without using extra trainable parameters. Extensive evaluations demonstrate that the proposed method performs favorably not only on image classification but also on robust detection of adversarial examples generated by strong attacks under different threat models.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة العربية الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً