A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

333 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jong-Chyi Su

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jong-Chyi Su - Zezhou Cheng - Subhransu Maji

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We evaluate the effectiveness of semi-supervised learning (SSL) on a realistic benchmark where data exhibits considerable class imbalance and contains images from novel classes. Our benchmark consists of two fine-grained classification datasets obtained by sampling classes from the Aves and Fungi taxonomy. We find that recently proposed SSL methods provide significant benefits, and can effectively use out-of-class data to improve performance when deep networks are trained from scratch. Yet their performance pales in comparison to a transfer learning baseline, an alternative approach for learning from a few examples. Furthermore, in the transfer setting, while existing SSL methods provide improvements, the presence of out-of-class is often detrimental. In this setting, standard fine-tuning followed by distillation-based self-training is the most robust. Our work suggests that semi-supervised learning with experts on realistic datasets may require different strategies than those currently prevalent in the literature.

قيم البحث

96 - Abhimanyu Dubey , Otkrist Gupta , Ramesh Raskar 2018

Fine-Grained Visual Classification (FGVC) is an important computer vision problem that involves small diversity within the different classes, and often requires expert annotators to collect data. Utilizing this notion of small visual diversity, we re visit Maximum-Entropy learning in the context of fine-grained classification, and provide a training routine that maximizes the entropy of the output probability distribution for training convolutional neural networks on FGVC tasks. We provide a theoretical as well as empirical justification of our approach, and achieve state-of-the-art performance across a variety of classification tasks in FGVC, that can potentially be extended to any fine-tuning task. Our method is robust to different hyperparameter values, amount of training data and amount of training label noise and can hence be a valuable tool in many similar problems.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Weakly Supervised Bilinear Attention Network for Fine-Grained Visual Classification

156 - Tao Hu , Jizheng Xu , Cong Huang 2018

For fine-grained visual classification, objects usually share similar geometric structure but present variant local appearance and different pose. Therefore, localizing and extracting discriminative local features play a crucial role in accurate cate gory prediction. Existing works either pay attention to limited object parts or train isolated networks for locating and classification. In this paper, we propose Weakly Supervised Bilinear Attention Network (WS-BAN) to solve these issues. It jointly generates a set of attention maps (region-of-interest maps) to indicate the locations of objects parts and extracts sequential part features by Bilinear Attention Pooling (BAP). Besides, we propose attention regularization and attention dropout to weakly supervise the generating process of attention maps. WS-BAN can be trained end-to-end and achieves the state-of-the-art performance on multiple fine-grained classification datasets, including CUB-200-2011, Stanford Car and FGVC-Aircraft, which demonstrated its effectiveness.

الرؤية الحاسوبية وتمييز الأنماط

Learning Attentive Pairwise Interaction for Fine-Grained Classification

135 - Peiqin Zhuang , Yali Wang , Yu Qiao 2020

Fine-grained classification is a challenging problem, due to subtle differences among highly-confused categories. Most approaches address this difficulty by learning discriminative representation of individual input image. On the other hand, humans c an effectively identify contrastive clues by comparing image pairs. Inspired by this fact, this paper proposes a simple but effective Attentive Pairwise Interaction Network (API-Net), which can progressively recognize a pair of fine-grained images by interaction. Specifically, API-Net first learns a mutual feature vector to capture semantic differences in the input pair. It then compares this mutual vector with individual vectors to generate gates for each input image. These distinct gate vectors inherit mutual context on semantic differences, which allow API-Net to attentively capture contrastive clues by pairwise interaction between two images. Additionally, we train API-Net in an end-to-end manner with a score ranking regularization, which can further generalize API-Net by taking feature priorities into account. We conduct extensive experiments on five popular benchmarks in fine-grained classification. API-Net outperforms the recent SOTA methods, i.e., CUB-200-2011 (90.0%), Aircraft(93.9%), Stanford Cars (95.3%), Stanford Dogs (90.3%), and NABirds (88.1%).

الرؤية الحاسوبية وتمييز الأنماط

FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems

163 - Liang Tong , Zhengzhang Chen , Jingchao Ni 2021

We present FACESEC, a framework for fine-grained robustness evaluation of face recognition systems. FACESEC evaluation is performed along four dimensions of adversarial modeling: the nature of perturbation (e.g., pixel-level or face accessories), the attackers system knowledge (about training data and learning architecture), goals (dodging or impersonation), and capability (tailored to individual inputs or across sets of these). We use FACESEC to study five face recognition systems in both closed-set and open-set settings, and to evaluate the state-of-the-art approach for defending against physically realizable attacks on these. We find that accurate knowledge of neural architecture is significantly more important than knowledge of the training data in black-box attacks. Moreover, we observe that open-set face recognition systems are more vulnerable than closed-set systems under different types of attacks. The efficacy of attacks for other threat model variations, however, appears highly dependent on both the nature of perturbation and the neural network architecture. For example, attacks that involve adversarial face masks are usually more potent, even against adversarially trained models, and the ArcFace architecture tends to be more robust than the others.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Two-View Fine-grained Classification of Plant Species

67 - Voncarlos M. Araujo , Alceu S. Britto Jr. , Luiz E. S. Oliveira andn Alessandro L. Koerich 2020

Automatic plant classification is a challenging problem due to the wide biodiversity of the existing plant species in a fine-grained scenario. Powerful deep learning architectures have been used to improve the classification performance in such a fin e-grained problem, but usually building models that are highly dependent on a large training dataset and which are not scalable. In this paper, we propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species. It uses the botanical taxonomy as a basis for a coarse-to-fine strategy applied to identify the plant genus and species. The two-view representation provides complementary global and local features of leaf images. A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species. The experimental results on two challenging fine-grained datasets of leaf images (i.e. LifeCLEF 2015 and LeafSnap) have shown the effectiveness of the proposed method, which achieved recognition accuracy of 0.87 and 0.96 respectively.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي