No Arabic abstract
In order to train robust deep learning models, large amounts of labelled data is required. However, in the absence of such large repositories of labelled data, unlabeled data can be exploited for the same. Semi-Supervised learning aims to utilize such unlabeled data for training classification models. Recent progress of self-training based approaches have shown promise in this area, which leads to this study where we utilize an ensemble approach for the same. A by-product of any semi-supervised approach may be loss of calibration of the trained model especially in scenarios where unlabeled data may contain out-of-distribution samples, which leads to this investigation on how to adapt to such effects. Our proposed algorithm carefully avoids common pitfalls in utilizing unlabeled data and leads to a more accurate and calibrated supervised model compared to vanilla self-training based student-teacher algorithms. We perform several experiments on the popular STL-10 database followed by an extensive analysis of our approach and study its effects on model accuracy and calibration.
In this paper, we present a novel approach, Momentum$^2$ Teacher, for student-teacher based self-supervised learning. The approach performs momentum update on both network weights and batch normalization (BN) statistics. The teachers weight is a momentum update of the student, and the teachers BN statistics is a momentum update of those in history. The Momentum$^2$ Teacher is simple and efficient. It can achieve the state of the art results (74.5%) under ImageNet linear evaluation protocol using small-batch size(eg, 128), without requiring large-batch training on special hardware like TPU or inefficient across GPU operation (eg, shuffling BN, synced BN). Our implementation and pre-trained models will be given on GitHubfootnote{https://github.com/zengarden/momentum2-teacher}.
Self-training achieves enormous success in various semi-supervised and weakly-supervised learning tasks. The method can be interpreted as a teacher-student framework, where the teacher generates pseudo-labels, and the student makes predictions. The two models are updated alternatingly. However, such a straightforward alternating update rule leads to training instability. This is because a small change in the teacher may result in a significant change in the student. To address this issue, we propose {ours}, short for differentiable self-training, that treats teacher-student as a Stackelberg game. In this game, a leader is always in a more advantageous position than a follower. In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels. Therefore, we treat the student as the leader and the teacher as the follower. The leader procures its advantage by acknowledging the followers strategy, which involves differentiable pseudo-labels and differentiable sample weights. Consequently, the leader-follower interaction can be effectively captured via Stackelberg gradient, obtained by differentiating the followers strategy. Experimental results on semi- and weakly-supervised classification and named entity recognition tasks show that our model outperforms existing approaches by large margins.
Recent generative adversarial networks (GANs) are able to generate impressive photo-realistic images. However, controllable generation with GANs remains a challenging research problem. Achieving controllable generation requires semantically interpretable and disentangled factors of variation. It is challenging to achieve this goal using simple fixed distributions such as Gaussian distribution. Instead, we propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Self-training provides an iterative feedback in the GAN training, from the discriminator to the generator, and progressively improves the proposal of the latent codes as training proceeds. The latent codes are sampled from a latent variable model that is learned in the feature space of the discriminator. We consider a normalized independent component analysis model and learn its parameters through tensor factorization of the higher-order moments. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder, and is able to discover semantically meaningful latent codes without any supervision. We demonstrate empirically on both cars and faces datasets that each group of elements in the learned code controls a mode of variation with a semantic meaning, e.g. pose or background change. We also demonstrate with quantitative metrics that our method generates better results compared to other approaches.
Recent work has demonstrated that neural networks are vulnerable to adversarial examples. To escape from the predicament, many works try to harden the model in various ways, in which adversarial training is an effective way which learns robust feature representation so as to resist adversarial attacks. Meanwhile, the self-supervised learning aims to learn robust and semantic embedding from data itself. With these views, we introduce self-supervised learning to against adversarial examples in this paper. Specifically, the self-supervised representation coupled with k-Nearest Neighbour is proposed for classification. To further strengthen the defense ability, self-supervised adversarial training is proposed, which maximizes the mutual information between the representations of original examples and the corresponding adversarial examples. Experimental results show that the self-supervised representation outperforms its supervised version in respect of robustness and self-supervised adversarial training can further improve the defense ability efficiently.
Recent works on sparse neural networks have demonstrated that it is possible to train a sparse network in isolation to match the performance of the corresponding dense networks with a fraction of parameters. However, the identification of these performant sparse neural networks (winning tickets) either involves a costly iterative train-prune-retrain process (e.g., Lottery Ticket Hypothesis) or an over-extended sparse training time (e.g., Training with Dynamic Sparsity), both of which would raise financial and environmental concerns. In this work, we attempt to address this cost-reducing problem by introducing the FreeTickets concept, as the first solution which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin, while using for complete training only a fraction of the computational resources required by the latter. Concretely, we instantiate the FreeTickets concept, by proposing two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets for free during the sparse training process. The combination of these free tickets into an ensemble demonstrates a significant improvement in accuracy, uncertainty estimation, robustness, and efficiency over the corresponding dense (ensemble) networks. Our results provide new insights into the strength of sparse neural networks and suggest that the benefits of sparsity go way beyond the usual training/inference expected efficiency. We will release all codes in https://github.com/Shiweiliuiiiiiii/FreeTickets.