No Arabic abstract
Comprehensive semantic segmentation is one of the key components for robust scene understanding and a requirement to enable autonomous driving. Driven by large scale datasets, convolutional neural networks show impressive results on this task. However, a segmentation algorithm generalizing to various scenes and conditions would require an enormously diverse dataset, making the labour intensive data acquisition and labeling process prohibitively expensive. Under the assumption of structural similarities between segmentation maps, domain adaptation promises to resolve this challenge by transferring knowledge from existing, potentially simulated datasets to new environments where no supervision exists. While the performance of this approach is contingent on the concept that neural networks learn a high level understanding of scene structure, recent work suggests that neural networks are biased towards overfitting to texture instead of learning structural and shape information. Considering the ideas underlying semantic segmentation, we employ random image stylization to augment the training dataset and propose a training procedure that facilitates texture underfitting to improve the performance of domain adaptation. In experiments with supervised as well as unsupervised methods for the task of synthetic-to-real domain adaptation, we show that our approach outperforms conventional training methods.
Since annotating pixel-level labels for semantic segmentation is laborious, leveraging synthetic data is an attractive solution. However, due to the domain gap between synthetic domain and real domain, it is challenging for a model trained with synthetic data to generalize to real data. In this paper, considering the fundamental difference between the two domains as the texture, we propose a method to adapt to the texture of the target domain. First, we diversity the texture of synthetic images using a style transfer algorithm. The various textures of generated images prevent a segmentation model from overfitting to one specific (synthetic) texture. Then, we fine-tune the model with self-training to get direct supervision of the target texture. Our results achieve state-of-the-art performance and we analyze the properties of the model trained on the stylized dataset with extensive experiments.
Conventional unsupervised domain adaptation (UDA) studies the knowledge transfer between a limited number of domains. This neglects the more practical scenario where data are distributed in numerous different domains in the real world. The domain similarity between those domains is critical for domain adaptation performance. To describe and learn relations between different domains, we propose a novel Domain2Vec model to provide vectorial representations of visual domains based on joint learning of feature disentanglement and Gram matrix. To evaluate the effectiveness of our Domain2Vec model, we create two large-scale cross-domain benchmarks. The first one is TinyDA, which contains 54 domains and about one million MNIST-style images. The second benchmark is DomainBank, which is collected from 56 existing vision datasets. We demonstrate that our embedding is capable of predicting domain similarities that match our intuition about visual relations between different domains. Extensive experiments are conducted to demonstrate the power of our new datasets in benchmarking state-of-the-art multi-source domain adaptation methods, as well as the advantage of our proposed model.
Image dehazing using learning-based methods has achieved state-of-the-art performance in recent years. However, most existing methods train a dehazing model on synthetic hazy images, which are less able to generalize well to real hazy images due to domain shift. To address this issue, we propose a domain adaptation paradigm, which consists of an image translation module and two image dehazing modules. Specifically, we first apply a bidirectional translation network to bridge the gap between the synthetic and real domains by translating images from one domain to another. And then, we use images before and after translation to train the proposed two image dehazing networks with a consistency constraint. In this phase, we incorporate the real hazy image into the dehazing training via exploiting the properties of the clear image (e.g., dark channel prior and image gradient smoothing) to further improve the domain adaptivity. By training image translation and dehazing network in an end-to-end manner, we can obtain better effects of both image translation and dehazing. Experimental results on both synthetic and real-world images demonstrate that our model performs favorably against the state-of-the-art dehazing algorithms.
Domain adaptation (DA) paves the way for label annotation and dataset bias issues by the knowledge transfer from a label-rich source domain to a related but unlabeled target domain. A mainstream of DA methods is to align the feature distributions of the two domains. However, the majority of them focus on the entire image features where irrelevant semantic information, e.g., the messy background, is inevitably embedded. Enforcing feature alignments in such case will negatively influence the correct matching of objects and consequently lead to the semantically negative transfer due to the confusion of irrelevant semantics. To tackle this issue, we propose Semantic Concentration for Domain Adaptation (SCDA), which encourages the model to concentrate on the most principal features via the pair-wise adversarial alignment of prediction distributions. Specifically, we train the classifier to class-wisely maximize the prediction distribution divergence of each sample pair, which enables the model to find the region with large differences among the same class of samples. Meanwhile, the feature extractor attempts to minimize that discrepancy, which suppresses the features of dissimilar regions among the same class of samples and accentuates the features of principal parts. As a general method, SCDA can be easily integrated into various DA methods as a regularizer to further boost their performance. Extensive experiments on the cross-domain benchmarks show the efficacy of SCDA.
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain. Most existing UDA methods learn domain-invariant feature representations by minimizing feature distances across domains. In this work, we build upon contrastive self-supervised learning to align features so as to reduce the domain discrepancy between training and testing sets. Exploring the same set of categories shared by both domains, we introduce a simple yet effective framework CDCL, for domain alignment. In particular, given an anchor image from one domain, we minimize its distances to cross-domain samples from the same class relative to those from different categories. Since target labels are unavailable, we use a clustering-based approach with carefully initialized centers to produce pseudo labels. In addition, we demonstrate that CDCL is a general framework and can be adapted to the data-free setting, where the source data are unavailable during training, with minimal modification. We conduct experiments on two widely used domain adaptation benchmarks, i.e., Office-31 and VisDA-2017, and demonstrate that CDCL achieves state-of-the-art performance on both datasets.