ﻻ يوجد ملخص باللغة العربية
We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches $74.3%$ top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and $79.6%$ with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub.
State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different augmented views of a sample. Because these approaches try to match views of the same sample, they can be too myopic and f
Geometric feature extraction is a crucial component of point cloud registration pipelines. Recent work has demonstrated how supervised learning can be leveraged to learn better and more compact 3D features. However, those approaches reliance on groun
Machine learning analysis of longitudinal neuroimaging data is typically based on supervised learning, which requires a large number of ground-truth labels to be informative. As ground-truth labels are often missing or expensive to obtain in neurosci
This paper explores the generalization loss of linear regression in variably parameterized families of models, both under-parameterized and over-parameterized. We show that the generalization curve can have an arbitrary number of peaks, and moreover,
How can neural networks trained by contrastive learning extract features from the unlabeled data? Why does contrastive learning usually need much stronger data augmentations than supervised learning to ensure good representations? These questions inv