ﻻ يوجد ملخص باللغة العربية
While self-supervised representation learning (SSL) has received widespread attention from the community, recent research argue that its performance will suffer a cliff fall when the model size decreases. The current method mainly relies on contrastive learning to train the network and in this work, we propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin. Specifically, we find the final embedding obtained by the mainstream SSL methods contains the most fruitful information, and propose to distill the final embedding to maximally transmit a teachers knowledge to a lightweight model by constraining the last embedding of the student to be consistent with that of the teacher. In addition, in the experiment, we find that there exists a phenomenon termed Distilling BottleNeck and present to enlarge the embedding dimension to alleviate this problem. Our method does not introduce any extra parameter to lightweight models during deployment. Experimental results demonstrate that our method achieves the state-of-the-art on all lightweight models. Particularly, when ResNet-101/ResNet-50 is used as teacher to teach EfficientNet-B0, the linear result of EfficientNet-B0 on ImageNet is very close to ResNet-101/ResNet-50, but the number of parameters of EfficientNet-B0 is only 9.4%/16.3% of ResNet-101/ResNet-50. Code is available at https://github. com/Yuting-Gao/DisCo-pytorch.
Generalising deep networks to novel domains without manual labels is challenging to deep learning. This problem is intrinsically difficult due to unpredictable changing nature of imagery data distributions in novel domains. Pre-learned knowledge does
For artificial learning systems, continual learning over time from a stream of data is essential. The burgeoning studies on supervised continual learning have achieved great progress, while the study of catastrophic forgetting in unsupervised learnin
Point clouds have attracted increasing attention. Significant progress has been made in methods for point cloud analysis, which often requires costly human annotation as supervision. To address this issue, we propose a novel self-contrastive learning
Leveraging temporal information has been regarded as essential for developing video understanding models. However, how to properly incorporate temporal information into the recent successful instance discrimination based contrastive self-supervised l
In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation. The in