ﻻ يوجد ملخص باللغة العربية
Whilst contrastive learning has achieved remarkable success in self-supervised representation learning, its potential for deep clustering remains unknown. This is due to its fundamental limitation that the instance discrimination strategy it takes is not class sensitive and hence unable to reason about the underlying decision boundaries between semantic concepts or classes. In this work, we solve this problem by introducing a novel variant called Semantic Contrastive Learning (SCL). It explores the characteristics of both conventional contrastive learning and deep clustering by imposing distance-based cluster structures on unlabelled training data and also introducing a discriminative contrastive loss formulation. For explicitly modelling class boundaries on-the-fly, we further formulate a clustering consistency condition on the two different predictions given by visual similarities and semantic decision boundaries. By advancing implicit representation learning towards explicit understandings of visual semantics, SCL can amplify jointly the strengths of contrastive learning and deep clustering in a unified approach. Extensive experiments show that the proposed model outperforms the state-of-the-art deep clustering methods on six challenging object recognition benchmarks, especially on finer-grained and larger datasets.
Recently, many unsupervised deep learning methods have been proposed to learn clustering with unlabelled data. By introducing data augmentation, most of the latest methods look into deep clustering from the perspective that the original image and its
Deep clustering successfully provides more effective features than conventional ones and thus becomes an important technique in current unsupervised learning. However, most deep clustering methods ignore the vital positive and negative pairs introduc
Collecting labeled data for the task of semantic segmentation is expensive and time-consuming, as it requires dense pixel-level annotations. While recent Convolutional Neural Network (CNN) based semantic segmentation approaches have achieved impressi
Deep learning methods have achieved great success in pedestrian detection, owing to its ability to learn features from raw pixels. However, they mainly capture middle-level representations, such as pose of pedestrian, but confuse positive with hard n
Contrastive learning has shown superior performance in embedding global and spatial invariant features in computer vision (e.g., image classification). However, its overall success of embedding local and spatial variant features is still limited, esp