ﻻ يوجد ملخص باللغة العربية
Generalising deep networks to novel domains without manual labels is challenging to deep learning. This problem is intrinsically difficult due to unpredictable changing nature of imagery data distributions in novel domains. Pre-learned knowledge does not transfer well without making strong assumptions about the learned and the novel domains. Different methods have been studied to address the underlying problem based on different assumptions, e.g. from domain adaptation to zero-shot and few-shot learning. In this work, we address this problem by transfer clustering that aims to learn a discriminative latent space of the unlabelled target data in a novel domain by knowledge transfer from labelled related domains. Specifically, we want to leverage relative (pairwise) imagery information, which is freely available and intrinsic to a target domain, to model the target domain image distribution characteristics as well as the prior-knowledge learned from related labelled domains to enable more discriminative clustering of unlabelled target data. Our method mitigates nontransferrable prior-knowledge by self-supervision, benefiting from both transfer and self-supervised learning. Extensive experiments on four datasets for image clustering tasks reveal the superiority of our model over the state-of-the-art transfer clustering techniques. We further demonstrate its competitive transferability on four zero-shot learning benchmarks.
While self-supervised representation learning (SSL) has received widespread attention from the community, recent research argue that its performance will suffer a cliff fall when the model size decreases. The current method mainly relies on contrasti
Unsupervised learning has always been appealing to machine learning researchers and practitioners, allowing them to avoid an expensive and complicated process of labeling the data. However, unsupervised learning of complex data is challenging, and ev
Temporal action segmentation is a task to classify each frame in the video with an action label. However, it is quite expensive to annotate every frame in a large corpus of videos to construct a comprehensive supervised training dataset. Thus in this
Existing unsupervised video-to-video translation methods fail to produce translated videos which are frame-wise realistic, semantic information preserving and video-level consistent. In this work, we propose UVIT, a novel unsupervised video-to-video
It is a strong prerequisite to access source data freely in many existing unsupervised domain adaptation approaches. However, source data is agnostic in many practical scenarios due to the constraints of expensive data transmission and data privacy p