What is the Best Feature Learning Procedure in Hierarchical Recognition Architectures?

65 0 0.0 ( 0 )

Download Cite

Added by Kevin Jarrett

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Kevin Jarrett - Koray Kvukcuoglu - Karol Gregor

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

(This paper was written in November 2011 and never published. It is posted on arXiv.org in its original form in June 2016). Many recent object recognition systems have proposed using a two phase training procedure to learn sparse convolutional feature hierarchies: unsupervised pre-training followed by supervised fine-tuning. Recent results suggest that these methods provide little improvement over purely supervised systems when the appropriate nonlinearities are included. This paper presents an empirical exploration of the space of learning procedures for sparse convolutional networks to assess which method produces the best performance. In our study, we introduce an augmentation of the Predictive Sparse Decomposition method that includes a discriminative term (DPSD). We also introduce a new single phase supervised learning procedure that places an L1 penalty on the output state of each layer of the network. This forces the network to produce sparse codes without the expensive pre-training phase. Using DPSD with a new, complex predictor that incorporates lateral inhibition, combined with multi-scale feature pooling, and supervised refinement, the system achieves a 70.6% recognition rate on Caltech-101. With the addition of convolutional training, a 77% recognition was obtained on the CIfAR-10 dataset.

rate research

What Is Considered Complete for Visual Recognition?

532 - Lingxi Xie , Xiaopeng Zhang , Longhui Wei 2021

This is an opinion paper. We hope to deliver a key message that current visual recognition systems are far from complete, i.e., recognizing everything that human can recognize, yet it is very unlikely that the gap can be bridged by continuously increasing human annotations. Based on the observation, we advocate for a new type of pre-training task named learning-by-compression. The computational models (e.g., a deep network) are optimized to represent the visual data using compact features, and the features preserve the ability to recover the original data. Semantic annotations, when available, play the role of weak supervision. An important yet challenging issue is the evaluation of image recovery, where we suggest some design principles and future research directions. We hope our proposal can inspire the community to pursue the compression-recovery tradeoff rather than the accuracy-complexity tradeoff.

Computer Vision and Pattern Recognition

Hierarchical Deep CNN Feature Set-Based Representation Learning for Robust Cross-Resolution Face Recognition

131 - Guangwei Gao , Yi Yu , Jian Yang 2021

Cross-resolution face recognition (CRFR), which is important in intelligent surveillance and biometric forensics, refers to the problem of matching a low-resolution (LR) probe face image against high-resolution (HR) gallery face images. Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space where the resolution discrepancy is mitigated. However, little works consider how to extract and utilize the intermediate discriminative features from the noisy LR query faces to further mitigate the resolution discrepancy due to the resolution limitations. In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR. In particular, our contributions are threefold. (i) To learn more robust and discriminative features, we desire to adaptively fuse the contextual features from different layers. (ii) To fully exploit these contextual features, we design a feature set-based representation learning (FSRL) scheme to collaboratively represent the hierarchical features for more accurate recognition. Moreover, FSRL utilizes the primitive form of feature maps to keep the latent structural information, especially in noisy cases. (iii) To further promote the recognition performance, we desire to fuse the hierarchical recognition outputs from different stages. Meanwhile, the discriminability from different scales can also be fully integrated. By exploiting these advantages, the efficiency of the proposed method can be delivered. Experimental results on several face datasets have verified the superiority of the presented algorithm to the other competitive CRFR approaches.

Computer Vision and Pattern Recognition

What is the best way to measure baryonic acoustic oscillations?

620 - Ariel G. Sanchez MPE 2008

Oscillations in the baryon-photon fluid prior to recombination imprint different signatures on the power spectrum and correlation function of matter fluctuations. The measurement of these features using galaxy surveys has been proposed as means to determine the equation of state of the dark energy. The accuracy required to achieve competitive constraints demands an extremely good understanding of systematic effects which change the baryonic acoustic oscillation (BAO) imprint. We use 50 very large volume N-body simulations to investigate the BAO signature in the two-point correlation function. The location of the BAO bump does not correspond to the sound horizon scale at the level of accuracy required by future measurements, even before any dynamical or statistical effects are considered. Careful modelling of the correlation function is therefore required to extract the cosmological information encoded on large scales. We find that the correlation function is less affected by scale dependent effects than the power spectrum. We show that a model for the correlation function proposed by Crocce & Scoccimarro (2008), based on renormalised perturbation theory, gives an essentially unbiased measurement of the dark energy equation of state. This means that information from the large scale shape of the correlation function, in addition to the form of the BAO peak, can be used to provide robust constraints on cosmological parameters. The correlation function therefore provides a better constraint on the distance scale (~50% smaller errors with no systematic bias) than the more conservative approach required when using the power spectrum (i.e. which requires amplitude and long wavelength shape information to be discarded).

What shapes feature representations? Exploring datasets, architectures, and training

79 - Katherine L. Hermann , Andrew K. Lampinen 2020

In naturalistic learning problems, a models input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models decisions, as well as for building models that learn versatile, adaptable representations useful beyond the original training task. We study these questions using synthetic datasets in which the task-relevance of input features can be controlled directly. We find that when two features redundantly predict the labels, the model preferentially represents one, and its preference reflects what was most linearly decodable from the untrained model. Over training, task-relevant features are enhanced, and task-irrelevant features are partially suppressed. Interestingly, in some cases, an easier, weakly predictive feature can suppress a more strongly predictive, but more difficult one. Additionally, models trained to recognize both easy and hard features learn representations most similar to models that use only the easy feature. Further, easy features lead to more consistent representations across model runs than do hard features. Finally, models have greater representational similarity to an untrained model than to models trained on a different task. Our results highlight the complex processes that determine which features a model represents.

Machine Learning Machine Learning

What Is Around The Camera?

67 - Stamatios Georgoulis , Konstantinos Rematas , Tobias Ritschel 2016

How much does a single image reveal about the environment it was taken in? In this paper, we investigate how much of that information can be retrieved from a foreground object, combined with the background (i.e. the visible part of the environment). Assuming it is not perfectly diffuse, the foreground object acts as a complexly shaped and far-from-perfect mirror. An additional challenge is that its appearance confounds the light coming from the environment with the unknown materials it is made of. We propose a learning-based approach to predict the environment from multiple reflectance maps that are computed from approximate surface normals. The proposed method allows us to jointly model the statistics of environments and material properties. We train our system from synthesized training data, but demonstrate its applicability to real-world data. Interestingly, our analysis shows that the information obtained from objects made out of multiple materials often is complementary and leads to better performance.

Computer Vision and Pattern Recognition