No Arabic abstract
Recently unsupervised domain adaptation for the semantic segmentation task has become more and more popular due to high-cost of pixel-level annotation on real-world images. However, most domain adaptation methods are only restricted to single-source-single-target pair, and can not be directly extended to multiple target domains. In this work, we propose a collaborative learning framework to achieve unsupervised multi-target domain adaptation. An unsupervised domain adaptation expert model is first trained for each source-target pair and is further encouraged to collaborate with each other through a bridge built between different target domains. These expert models are further improved by adding the regularization of making the consistent pixel-wise prediction for each sample with the same structured context. To obtain a single model that works across multiple target domains, we propose to simultaneously learn a student model which is trained to not only imitate the output of each expert on the corresponding target domain, but also to pull different expert close to each other with regularization on their weights. Extensive experiments demonstrate that the proposed method can effectively exploit rich structured information contained in both labeled source domain and multiple unlabeled target domains. Not only does it perform well across multiple target domains but also performs favorably against state-of-the-art unsupervised domain adaptation methods specially trained on a single source-target pair
Federated learning methods enable us to train machine learning models on distributed user data while preserving its privacy. However, it is not always feasible to obtain high-quality supervisory signals from users, especially for vision tasks. Unlike typical federated settings with labeled client data, we consider a more practical scenario where the distributed client data is unlabeled, and a centralized labeled dataset is available on the server. We further take the server-client and inter-client domain shifts into account and pose a domain adaptation problem with one source (centralized server data) and multiple targets (distributed client data). Within this new Federated Multi-Target Domain Adaptation (FMTDA) task, we analyze the model performance of exiting domain adaptation methods and propose an effective DualAdapt method to address the new challenges. Extensive experimental results on image classification and semantic segmentation tasks demonstrate that our method achieves high accuracy, incurs minimal communication cost, and requires low computational resources on client devices.
Most domain adaptation methods focus on single-source-single-target adaptation setting. Multi-target domain adaptation is a powerful extension in which a single classifier is learned for multiple unlabeled target domains. To build a multi-target classifier, it is crucial to effectively aggregate features from the labeled source and different unlabeled target domains. Towards this, recently introduced Domain-aware Curriculum Graph Co-Teaching (D-CGCT) exploits dual classifier head, one of which is based on the graph neural network. D-CGCT uses a sequential adaptation strategy that adapts one domain at a time starting from the target domains that are more similar to the source, assuming that the network finds it easier to adapt to such target domains. However, we argue that there is no easier domain or difficult domain in absolute sense and each domain can have samples showing different characteristics. Following this cue, we propose Reiterative D-CGCT (RD-CGCT) that obtains better adaptation performance by reiterating multiple times over each target domain, while keeping the total number of iterations as same. RD-CGCT further improves the adaptation performance by considering more source samples than training samples in the training minibatch. Proposed RD-CGCT significantly improves the performance over D-CGCT for Office-Home and Office31 datasets.
Recent advances in unsupervised domain adaptation have significantly improved the recognition accuracy of CNNs by alleviating the domain shift between (labeled) source and (unlabeled) target data distributions. While the problem of single-target domain adaptation (STDA) for object detection has recently received much attention, multi-target domain adaptation (MTDA) remains largely unexplored, despite its practical relevance in several real-world applications, such as multi-camera video surveillance. Compared to the STDA problem that may involve large domain shifts between complex source and target distributions, MTDA faces additional challenges, most notably the computational requirements and catastrophic forgetting of previously-learned targets, which can depend on the order of target adaptations. STDA for detection can be applied to MTDA by adapting one model per target, or one common model with a mixture of data from target domains. However, these approaches are either costly or inaccurate. The only state-of-art MTDA method specialized for detection learns targets incrementally, one target at a time, and mitigates the loss of knowledge by using a duplicated detection model for knowledge distillation, which is computationally expensive and does not scale well to many domains. In this paper, we introduce an efficient approach for incremental learning that generalizes well to multiple target domains. Our MTDA approach is more suitable for real-world applications since it allows updating the detection model incrementally, without storing data from previous-learned target domains, nor retraining when a new target domain becomes available. Our proposed method, MTDA-DTM, achieved the highest level of detection accuracy compared against state-of-the-art approaches on several MTDA detection benchmarks and Wildtrack, a benchmark for multi-camera pedestrian detection.
Unsupervised domain adaptation (UDA) seeks to alleviate the problem of domain shift between the distribution of unlabeled data from the target domain w.r.t. labeled data from the source domain. While the single-target UDA scenario is well studied in the literature, Multi-Target Domain Adaptation (MTDA) remains largely unexplored despite its practical importance, e.g., in multi-camera video-surveillance applications. The MTDA problem can be addressed by adapting one specialized model per target domain, although this solution is too costly in many real-world applications. Blending multiple targets for MTDA has been proposed, yet this solution may lead to a reduction in model specificity and accuracy. In this paper, we propose a novel unsupervised MTDA approach to train a CNN that can generalize well across multiple target domains. Our Multi-Teacher MTDA (MT-MTDA) method relies on multi-teacher knowledge distillation (KD) to iteratively distill target domain knowledge from multiple teachers to a common student. The KD process is performed in a progressive manner, where the student is trained by each teacher on how to perform UDA for a specific target, instead of directly learning domain adapted features. Finally, instead of combining the knowledge from each teacher, MT-MTDA alternates between teachers that distill knowledge, thereby preserving the specificity of each target (teacher) when learning to adapt to the student. MT-MTDA is compared against state-of-the-art methods on several challenging UDA benchmarks, and empirical results show that our proposed model can provide a considerably higher level of accuracy across multiple target domains. Our code is available at: https://github.com/LIVIAETS/MT-MTDA
We consider the problem of unsupervised domain adaptation for image classification. To learn target-domain-aware features from the unlabeled data, we create a self-supervised pretext task by augmenting the unlabeled data with a certain type of transformation (specifically, image rotation) and ask the learner to predict the properties of the transformation. However, the obtained feature representation may contain a large amount of irrelevant information with respect to the main task. To provide further guidance, we force the feature representation of the augmented data to be consistent with that of the original data. Intuitively, the consistency introduces additional constraints to representation learning, therefore, the learned representation is more likely to focus on the right information about the main task. Our experimental results validate the proposed method and demonstrate state-of-the-art performance on classical domain adaptation benchmarks. Code is available at https://github.com/Jiaolong/ss-da-consistency.