ترغب بنشر مسار تعليمي؟ اضغط هنا

When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets

121   0   0.0 ( 0 )
 نشر من قبل Hong-Yu Zhou
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Semi-Supervised Learning (SSL) has been proved to be an effective way to leverage both labeled and unlabeled data at the same time. Recent semi-supervised approaches focus on deep neural networks and have achieved promising results on several benchmarks: CIFAR10, CIFAR100 and SVHN. However, most of their experiments are based on models trained from scratch instead of pre-trained models. On the other hand, transfer learning has demonstrated its value when the target domain has limited labeled data. Here comes the intuitive question: is it possible to incorporate SSL when fine-tuning a pre-trained model? We comprehensively study how SSL methods starting from pretrained models perform under varying conditions, including training strategies, architecture choice and datasets. From this study, we obtain several interesting and useful observations. While practitioners have had an intuitive understanding of these observations, we do a comprehensive emperical analysis and demonstrate that: (1) the gains from SSL techniques over a fully-supervised baseline are smaller when trained from a pre-trained model than when trained from random initialization, (2) when the domain of the source data used to train the pre-trained model differs significantly from the domain of the target task, the gains from SSL are significantly higher and (3) some SSL methods are able to advance fully-supervised baselines (like Pseudo-Label). We hope our studies can deepen the understanding of SSL research and facilitate the process of developing more effective SSL methods to utilize pre-trained models. Code is now available at github.

قيم البحث

اقرأ أيضاً

What is the best way to learn a universal face representation? Recent work on Deep Learning in the area of face analysis has focused on supervised learning for specific tasks of interest (e.g. face recognition, facial landmark localization etc.) but has overlooked the overarching question of how to find a facial representation that can be readily adapted to several facial analysis tasks and datasets. To this end, we make the following 4 contributions: (a) we introduce, for the first time, a comprehensive evaluation benchmark for facial representation learning consisting of 5 important face analysis tasks. (b) We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training. Importantly, we focus our evaluations on the case of few-shot facial learning. (c) We investigate important properties of the training datasets including their size and quality (labelled, unlabelled or even uncurated). (d) To draw our conclusions, we conducted a very large number of experiments. Our main two findings are: (1) Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements for all facial tasks considered. (2) Many existing facial video datasets seem to have a large amount of redundancy. We will release code, pre-trained models and data to facilitate future research.
Recent semi-supervised learning methods use pseudo supervision as core idea, especially self-training methods that generate pseudo labels. However, pseudo labels are unreliable. Self-training methods usually rely on single model prediction confidence to filter low-confidence pseudo labels, thus remaining high-confidence errors and wasting many low-confidence correct labels. In this paper, we point out it is difficult for a model to counter its own errors. Instead, leveraging inter-model disagreement between different models is a key to locate pseudo label errors. With this new viewpoint, we propose mutual training between two different models by a dynamically re-weighted loss function, called Dynamic Mutual Training (DMT). We quantify inter-model disagreement by comparing predictions from two different models to dynamically re-weight loss in training, where a larger disagreement indicates a possible error and corresponds to a lower loss value. Extensive experiments show that DMT achieves state-of-the-art performance in both image classification and semantic segmentation. Our codes are released at https://github.com/voldemortX/DST-CBC .
While recent studies on semi-supervised learning have shown remarkable progress in leveraging both labeled and unlabeled data, most of them presume a basic setting of the model is randomly initialized. In this work, we consider semi-supervised learni ng and transfer learning jointly, leading to a more practical and competitive paradigm that can utilize both powerful pre-trained models from source domain as well as labeled/unlabeled data in the target domain. To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization that consists of two complementary components: Adaptive Knowledge Consistency (AKC) on the examples between the source and target model, and Adaptive Representation Consistency (ARC) on the target model between labeled and unlabeled examples. Examples involved in the consistency regularization are adaptively selected according to their potential contributions to the target task. We conduct extensive experiments on popular benchmarks including CIFAR-10, CUB-200, and MURA, by fine-tuning the ImageNet pre-trained ResNet-50 model. Results show that our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and FixMatch. Moreover, our algorithm is orthogonal to existing methods and thus able to gain additional improvements on top of MixMatch and FixMatch. Our code is available at https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning.
360 - Zhanghan Ke , Di Qiu , Kaican Li 2020
We investigate the generalization of semi-supervised learning (SSL) to diverse pixel-wise tasks. Although SSL methods have achieved impressive results in image classification, the performances of applying them to pixel-wise tasks are unsatisfactory d ue to their need for dense outputs. In addition, existing pixel-wise SSL approaches are only suitable for certain tasks as they usually require to use task-specific properties. In this paper, we present a new SSL framework, named Guided Collaborative Training (GCT), for pixel-wise tasks, with two main technical contributions. First, GCT addresses the issues caused by the dense outputs through a novel flaw detector. Second, the modules in GCT learn from unlabeled data collaboratively through two newly proposed constraints that are independent of task-specific properties. As a result, GCT can be applied to a wide range of pixel-wise tasks without structural adaptation. Our extensive experiments on four challenging vision tasks, including semantic segmentation, real image denoising, portrait image matting, and night image enhancement, show that GCT outperforms state-of-the-art SSL methods by a large margin. Our code available at: https://github.com/ZHKKKe/PixelSSL.
In real-world applications, it is often expensive and time-consuming to obtain labeled examples. In such cases, knowledge transfer from related domains, where labels are abundant, could greatly reduce the need for extensive labeling efforts. In this scenario, transfer learning comes in hand. In this paper, we propose Deep Variational Transfer (DVT), a variational autoencoder that transfers knowledge across domains using a shared latent Gaussian mixture model. Thanks to the combination of a semi-supervised ELBO and parameters sharing across domains, we are able to simultaneously: (i) align all supervised examples of the same class into the same latent Gaussian Mixture component, independently from their domain; (ii) predict the class of unsupervised examples from different domains and use them to better model the occurring shifts. We perform tests on MNIST and USPS digits datasets, showing DVTs ability to perform transfer learning across heterogeneous datasets. Additionally, we present DVTs top classification performances on the MNIST semi-supervised learning challenge. We further validate DVT on a astronomical datasets. DVT achieves states-of-the-art classification performances, transferring knowledge across real stars surveys datasets, EROS, MACHO and HiTS, . In the worst performance, we double the achieved F1-score for rare classes. These experiments show DVTs ability to tackle all major challenges posed by transfer learning: different covariate distributions, different and highly imbalanced class distributions and different feature spaces.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا