No Arabic abstract
We propose self-adaptive training -- a unified training algorithm that dynamically calibrates and enhances training process by model predictions without incurring extra computational cost -- to advance both supervised and self-supervised learning of deep neural networks. We analyze the training dynamics of deep networks on training data that are corrupted by, e.g., random noise and adversarial examples. Our analysis shows that model predictions are able to magnify useful underlying information in data and this phenomenon occurs broadly even in the absence of emph{any} label information, highlighting that model predictions could substantially benefit the training process: self-adaptive training improves the generalization of deep networks under noise and enhances the self-supervised representation learning. The analysis also sheds light on understanding deep learning, e.g., a potential explanation of the recently-discovered double-descent phenomenon in empirical risk minimization and the collapsing issue of the state-of-the-art self-supervised learning algorithms. Experiments on the CIFAR, STL and ImageNet datasets verify the effectiveness of our approach in three applications: classification with label noise, selective classification and linear evaluation. To facilitate future research, the code has been made public available at https://github.com/LayneH/self-adaptive-training.
Recent work has demonstrated that neural networks are vulnerable to adversarial examples. To escape from the predicament, many works try to harden the model in various ways, in which adversarial training is an effective way which learns robust feature representation so as to resist adversarial attacks. Meanwhile, the self-supervised learning aims to learn robust and semantic embedding from data itself. With these views, we introduce self-supervised learning to against adversarial examples in this paper. Specifically, the self-supervised representation coupled with k-Nearest Neighbour is proposed for classification. To further strengthen the defense ability, self-supervised adversarial training is proposed, which maximizes the mutual information between the representations of original examples and the corresponding adversarial examples. Experimental results show that the self-supervised representation outperforms its supervised version in respect of robustness and self-supervised adversarial training can further improve the defense ability efficiently.
Machine learning analysis of longitudinal neuroimaging data is typically based on supervised learning, which requires a large number of ground-truth labels to be informative. As ground-truth labels are often missing or expensive to obtain in neuroscience, we avoid them in our analysis by combing factor disentanglement with self-supervised learning to identify changes and consistencies across the multiple MRIs acquired of each individual over time. Specifically, we propose a new definition of disentanglement by formulating a multivariate mapping between factors (e.g., brain age) associated with an MRI and a latent image representation. Then, factors that evolve across acquisitions of longitudinal sequences are disentangled from that mapping by self-supervised learning in such a way that changes in a single factor induce change along one direction in the representation space. We implement this model, named Longitudinal Self-Supervised Learning (LSSL), via a standard autoencoding structure with a cosine loss to disentangle brain age from the image representation. We apply LSSL to two longitudinal neuroimaging studies to highlight its strength in extracting the brain-age information from MRI and revealing informative characteristics associated with neurodegenerative and neuropsychological disorders. Moreover, the representations learned by LSSL facilitate supervised classification by recording faster convergence and higher (or similar) prediction accuracy compared to several other representation learning techniques.
The popularisation of neural networks has seen incredible advances in pattern recognition, driven by the supervised learning of human annotations. However, this approach is unsustainable in relation to the dramatically increasing size of real-world datasets. This has led to a resurgence in self-supervised learning, a paradigm whereby the model generates its own supervisory signal from the data. Here we propose a hybrid quantum-classical neural network architecture for contrastive self-supervised learning and test its effectiveness in proof-of-principle experiments. Interestingly, we observe a numerical advantage for the learning of visual representations using small-scale quantum neural networks over equivalently structured classical networks, even when the quantum circuits are sampled with only 100 shots. Furthermore, we apply our best quantum model to classify unseen images on the ibmq_paris quantum computer and find that current noisy devices can already achieve equal accuracy to the equivalent classical model on downstream tasks.
The common self-supervised pre-training practice requires collecting massive unlabeled data together and then trains a representation model, dubbed textbf{joint training}. However, in real-world scenarios where data are collected in a streaming fashion, the joint training scheme is usually storage-heavy and time-consuming. A more efficient alternative is to train a model continually with streaming data, dubbed textbf{sequential training}. Nevertheless, it is unclear how well sequential self-supervised pre-training performs with streaming data. In this paper, we conduct thorough experiments to investigate self-supervised pre-training with streaming data. Specifically, we evaluate the transfer performance of sequential self-supervised pre-training with four different data sequences on three different downstream tasks and make comparisons with joint self-supervised pre-training. Surprisingly, we find sequential self-supervised learning exhibits almost the same performance as the joint training when the distribution shifts within streaming data are mild. Even for data sequences with large distribution shifts, sequential self-supervised training with simple techniques, e.g., parameter regularization or data replay, still performs comparably to joint training. Based on our findings, we recommend using sequential self-supervised training as a textbf{more efficient yet performance-competitive} representation learning practice for real-world applications.
How can neural networks trained by contrastive learning extract features from the unlabeled data? Why does contrastive learning usually need much stronger data augmentations than supervised learning to ensure good representations? These questions involve both the optimization and statistical aspects of deep learning, but can hardly be answered by analyzing supervised learning, where the target functions are the highest pursuit. Indeed, in self-supervised learning, it is inevitable to relate to the optimization/generalization of neural networks to how they can encode the latent structures in the data, which we refer to as the feature learning process. In this work, we formally study how contrastive learning learns the feature representations for neural networks by analyzing its feature learning process. We consider the case where our data are comprised of two types of features: the more semantically aligned sparse features which we want to learn from, and the other dense features we want to avoid. Theoretically, we prove that contrastive learning using $mathbf{ReLU}$ networks provably learns the desired sparse features if proper augmentations are adopted. We present an underlying principle called $textbf{feature decoupling}$ to explain the effects of augmentations, where we theoretically characterize how augmentations can reduce the correlations of dense features between positive samples while keeping the correlations of sparse features intact, thereby forcing the neural networks to learn from the self-supervision of sparse features. Empirically, we verified that the feature decoupling principle matches the underlying mechanism of contrastive learning in practice.