ترغب بنشر مسار تعليمي؟ اضغط هنا

Rethinking Deep Contrastive Learning with Embedding Memory

82   0   0.0 ( 0 )
 نشر من قبل Haozhi Zhang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Pair-wise loss functions have been extensively studied and shown to continuously improve the performance of deep metric learning (DML). However, they are primarily designed with intuition based on simple toy examples, and experimentally identifying the truly effective design is difficult in complicated, real-world cases. In this paper, we provide a new methodology for systematically studying weighting strategies of various pair-wise loss functions, and rethink pair weighting with an embedding memory. We delve into the weighting mechanisms by decomposing the pair-wise functions, and study positive and negative weights separately using direct weight assignment. This allows us to study various weighting functions deeply and systematically via weight curves, and identify a number of meaningful, comprehensive and insightful facts, which come up with our key observation on memory-based DML: it is critical to mine hard negatives and discard easy negatives which are less informative and redundant, but weighting on positive pairs is not helpful. This results in an efficient but surprisingly simple rule to design the weighting scheme, making it significantly different from existing mini-batch based methods which design various sophisticated loss functions to weight pairs carefully. Finally, we conduct extensive experiments on three large-scale visual retrieval benchmarks, and demonstrate the superiority of memory-based DML over recent mini-batch based approaches, by using a simple contrastive loss with momentum-updated memory.



قيم البحث

اقرأ أيضاً

Generalized zero-shot learning (GZSL) aims to recognize objects from both seen and unseen classes, when only the labeled examples from seen classes are provided. Recent feature generation methods learn a generative model that can synthesize the missi ng visual features of unseen classes to mitigate the data-imbalance problem in GZSL. However, the original visual feature space is suboptimal for GZSL classification since it lacks discriminative information. To tackle this issue, we propose to integrate the generation model with the embedding model, yielding a hybrid GZSL framework. The hybrid GZSL approach maps both the real and the synthetic samples produced by the generation model into an embedding space, where we perform the final GZSL classification. Specifically, we propose a contrastive embedding (CE) for our hybrid GZSL framework. The proposed contrastive embedding can leverage not only the class-wise supervision but also the instance-wise supervision, where the latter is usually neglected by existing GZSL researches. We evaluate our proposed hybrid GZSL framework with contrastive embedding, named CE-GZSL, on five benchmark datasets. The results show that our CEGZSL method can outperform the state-of-the-arts by a significant margin on three datasets. Our codes are available on https://github.com/Hanzy1996/CE-GZSL.
Multi-view network embedding aims at projecting nodes in the network to low-dimensional vectors, while preserving their multiple relations and attribute information. Contrastive learning-based methods have preliminarily shown promising performance in this task. However, most contrastive learning-based methods mostly rely on high-quality graph embedding and explore less on the relationships between different graph views. To deal with these deficiencies, we design a novel node-to-node Contrastive learning framework for Multi-view network Embedding (CREME), which mainly contains two contrastive objectives: Multi-view fusion InfoMax and Inter-view InfoMin. The former objective distills information from embeddings generated from different graph views, while the latter distinguishes different graph views better to capture the complementary information between them. Specifically, we first apply a view encoder to generate each graph view representation and utilize a multi-view aggregator to fuse these representations. Then, we unify the two contrastive objectives into one learning objective for training. Extensive experiments on three real-world datasets show that CREME outperforms existing methods consistently.
Outlier detection is one of the most important processes taken to create good, reliable data in machine learning. The most methods of outlier detection leverage an auxiliary reconstruction task by assuming that outliers are more difficult to be recov ered than normal samples (inliers). However, it is not always true, especially for auto-encoder (AE) based models. They may recover certain outliers even outliers are not in the training data, because they do not constrain the feature learning. Instead, we think outlier detection can be done in the feature space by measuring the feature distance between outliers and inliers. We then propose a framework, MCOD, using a memory module and a contrastive learning module. The memory module constrains the consistency of features, which represent the normal data. The contrastive learning module learns more discriminating features, which boosts the distinction between outliers and inliers. Extensive experiments on four benchmark datasets show that our proposed MCOD achieves a considerable performance and outperforms nine state-of-the-art methods.
112 - Jiabo Huang , Shaogang Gong 2021
Whilst contrastive learning has achieved remarkable success in self-supervised representation learning, its potential for deep clustering remains unknown. This is due to its fundamental limitation that the instance discrimination strategy it takes is not class sensitive and hence unable to reason about the underlying decision boundaries between semantic concepts or classes. In this work, we solve this problem by introducing a novel variant called Semantic Contrastive Learning (SCL). It explores the characteristics of both conventional contrastive learning and deep clustering by imposing distance-based cluster structures on unlabelled training data and also introducing a discriminative contrastive loss formulation. For explicitly modelling class boundaries on-the-fly, we further formulate a clustering consistency condition on the two different predictions given by visual similarities and semantic decision boundaries. By advancing implicit representation learning towards explicit understandings of visual semantics, SCL can amplify jointly the strengths of contrastive learning and deep clustering in a unified approach. Extensive experiments show that the proposed model outperforms the state-of-the-art deep clustering methods on six challenging object recognition benchmarks, especially on finer-grained and larger datasets.
Recently, many unsupervised deep learning methods have been proposed to learn clustering with unlabelled data. By introducing data augmentation, most of the latest methods look into deep clustering from the perspective that the original image and its transformation should share similar semantic clustering assignment. However, the representation features could be quite different even they are assigned to the same cluster since softmax function is only sensitive to the maximum value. This may result in high intra-class diversities in the representation feature space, which will lead to unstable local optimal and thus harm the clustering performance. To address this drawback, we proposed Deep Robust Clustering (DRC). Different from existing methods, DRC looks into deep clustering from two perspectives of both semantic clustering assignment and representation feature, which can increase inter-class diversities and decrease intra-class diversities simultaneously. Furthermore, we summarized a general framework that can turn any maximizing mutual information into minimizing contrastive loss by investigating the internal relationship between mutual information and contrastive learning. And we successfully applied it in DRC to learn invariant features and robust clusters. Extensive experiments on six widely-adopted deep clustering benchmarks demonstrate the superiority of DRC in both stability and accuracy. e.g., attaining 71.6% mean accuracy on CIFAR-10, which is 7.1% higher than state-of-the-art results.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا