Heterogeneous Contrastive Learning


Abstract in English

With the advent of big data across multiple high-impact applications, we are often facing the challenge of complex heterogeneity. The newly collected data usually consist of multiple modalities and characterized with multiple labels, thus exhibiting the co-existence of multiple types of heterogeneity. Although state-of-the-art techniques are good at modeling the complex heterogeneity with sufficient label information, such label information can be quite expensive to obtain in real applications, leading to sub-optimal performance using these techniques. Inspired by the capability of contrastive learning to utilize rich unlabeled data for improving performance, in this paper, we propose a unified heterogeneous learning framework, which combines both weighted unsupervised contrastive loss and weighted supervised contrastive loss to model multiple types of heterogeneity. We also provide theoretical analyses showing that the proposed weighted supervised contrastive loss is the lower bound of the mutual information of two samples from the same class and the weighted unsupervised contrastive loss is the lower bound of the mutual information between the hidden representation of two views of the same sample. Experimental results on real-world data sets demonstrate the effectiveness and the efficiency of the proposed method modeling multiple types of heterogeneity.

Download