Adversarial Bipartite Graph Learning for Video Domain Adaptation

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yadan Luo

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yadan Luo - Zi Huang - Zijian Wang

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area due to the significant spatial and temporal shifts across the source (i.e. training) and target (i.e. test) domains. As such, recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations and strengthen the feature transferability are not highly effective on the videos. To overcome this limitation, in this paper, we learn a domain-agnostic video classifier instead of learning domain-invariant representations, and propose an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions with a network topology of the bipartite graph. Specifically, the source and target frames are sampled as heterogeneous vertexes while the edges connecting two types of nodes measure the affinity among them. Through message-passing, each vertex aggregates the features from its heterogeneous neighbors, forcing the features coming from the same class to be mixed evenly. Explicitly exposing the video classifier to such cross-domain representations at the training and test stages makes our model less biased to the labeled source data, which in-turn results in achieving a better generalization on the target domain. To further enhance the model capacity and testify the robustness of the proposed architecture on difficult transfer tasks, we extend our model to work in a semi-supervised setting using an additional video-level bipartite graph. Extensive experiments conducted on four benchmarks evidence the effectiveness of the proposed approach over the SOTA methods on the task of video recognition.

قيم البحث

125 - Yuan Xie , Tianshui Chen , Tao Pu 2020

Data inconsistency and bias are inevitable among different facial expression recognition (FER) datasets due to subjective annotating process and different collecting conditions. Recent works resort to adversarial mechanisms that learn domain-invarian t features to mitigate domain shift. However, most of these works focus on holistic feature adaptation, and they ignore local features that are more transferable across different datasets. Moreover, local features carry more detailed and discriminative content for expression recognition, and thus integrating local features may enable fine-grained adaptation. In this work, we propose a novel Adversarial Graph Representation Adaptation (AGRA) framework that unifies graph representation propagation with adversarial learning for cross-domain holistic-local feature co-adaptation. To achieve this, we first build a graph to correlate holistic and local regions within each domain and another graph to correlate these regions across different domains. Then, we learn the per-class statistical distribution of each domain and extract holistic-local features from the input image to initialize the corresponding graph nodes. Finally, we introduce two stacked graph convolution networks to propagate holistic-local feature within each domain to explore their interaction and across different domains for holistic-local feature co-adaptation. In this way, the AGRA framework can adaptively learn fine-grained domain-invariant features and thus facilitate cross-domain expression recognition. We conduct extensive and fair experiments on several popular benchmarks and show that the proposed AGRA framework achieves superior performance over previous state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط

Learning Cross-modal Contrastive Features for Video Domain Adaptation

149 - Donghyun Kim , Yi-Hsuan Tsai , Bingbing Zhuang 2021

Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been deri ved from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the cross-modal inputs under the cross-domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC-Kitchens, and demonstrate the effectiveness of our components against state-of-the-art algorithms.

الرؤية الحاسوبية وتمييز الأنماط

Partial Video Domain Adaptation with Partial Adversarial Temporal Attentive Network

91 - Yuecong Xu , Jianfei Yang , Haozhi Cao 2021

Partial Domain Adaptation (PDA) is a practical and general domain adaptation scenario, which relaxes the fully shared label space assumption such that the source label space subsumes the target one. The key challenge of PDA is the issue of negative t ransfer caused by source-only classes. For videos, such negative transfer could be triggered by both spatial and temporal features, which leads to a more challenging Partial Video Domain Adaptation (PVDA) problem. In this paper, we propose a novel Partial Adversarial Temporal Attentive Network (PATAN) to address the PVDA problem by utilizing both spatial and temporal features for filtering source-only classes. Besides, PATAN constructs effective overall temporal features by attending to local temporal features that contribute more toward the class filtration process. We further introduce new benchmarks to facilitate research on PVDA problems, covering a wide range of PVDA scenarios. Empirical results demonstrate the state-of-the-art performance of our proposed PATAN across the multiple PVDA benchmarks.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Active Adversarial Domain Adaptation

87 - Jong-Chyi Su , Yi-Hsuan Tsai , Kihyuk Sohn 2019

We propose an active learning approach for transferring representations across domains. Our approach, active adversarial domain adaptation (AADA), explores a duality between two related problems: adversarial domain alignment and importance sampling f or adapting models across domains. The former uses a domain discriminative model to align domains, while the latter utilizes it to weigh samples to account for distribution shifts. Specifically, our importance weight promotes samples with large uncertainty in classification and diversity from labeled examples, thus serves as a sample selection scheme for active learning. We show that these two views can be unified in one framework for domain adaptation and transfer learning when the source domain has many labeled examples while the target domain does not. AADA provides significant improvements over fine-tuning based approaches and other sampling methods when the two domains are closely related. Results on challenging domain adaptation tasks, e.g., object detection, demonstrate that the advantage over baseline approaches is retained even after hundreds of examples being actively annotated.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Federated Adversarial Domain Adaptation

409 - Xingchao Peng , Zijun Huang , Yizhe Zhu 2019

Federated learning improves data privacy and efficiency in machine learning performed over networks of distributed devices, such as mobile phones, IoT and wearable devices, etc. Yet models trained with federated learning can still fail to generalize to new devices due to the problem of domain shift. Domain shift occurs when the labeled data collected by source nodes statistically differs from the target nodes unlabeled data. In this work, we present a principled approach to the problem of federated domain adaptation, which aims to align the representations learned among the different nodes with the data distribution of the target node. Our approach extends adversarial adaptation techniques to the constraints of the federated setting. In addition, we devise a dynamic attention mechanism and leverage feature disentanglement to enhance knowledge transfer. Empirically, we perform extensive experiments on several image and text classification tasks and show promising results under unsupervised federated domain adaptation setting.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي