ﻻ يوجد ملخص باللغة العربية
Self-supervised learning, which benefits from automatically constructing labels through pre-designed pretext task, has recently been applied for strengthen supervised learning. Since previous self-supervised pretext tasks are based on input, they may incur huge additional training overhead. In this paper we find that features in CNNs can be also used for self-supervision. Thus we creatively design the emph{feature-based pretext task} which requires only a small amount of additional training overhead. In our task we discard different particular regions of features, and then train the model to distinguish these different features. In order to fully apply our feature-based pretext task in supervised learning, we also propose a novel learning framework containing multi-classifiers for further improvement. Original labels will be expanded to joint labels via self-supervision of feature transformations. With more semantic information provided by our self-supervised tasks, this approach can train CNNs more effectively. Extensive experiments on various supervised learning tasks demonstrate the accuracy improvement and wide applicability of our method.
In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation. The in
Existing self-supervised learning methods learn representation by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be c
Photometric loss is widely used for self-supervised depth and egomotion estimation. However, the loss landscapes induced by photometric differences are often problematic for optimization, caused by plateau landscapes for pixels in textureless regions
Most of the existing self-supervised feature learning methods for 3D data either learn 3D features from point cloud data or from multi-view images. By exploring the inherent multi-modality attributes of 3D objects, in this paper, we propose to jointl
In this work, we propose a novel methodology for self-supervised learning for generating global and local attention-aware visual features. Our approach is based on training a model to differentiate between specific image transformations of an input s