ﻻ يوجد ملخص باللغة العربية
Unsupervised Domain Adaptation (UDA) can transfer knowledge from labeled source data to unlabeled target data of the same categories. However, UDA for first-person action recognition is an under-explored problem, with lack of datasets and limited consideration of first-person video characteristics. This paper focuses on addressing this problem. Firstly, we propose two small-scale first-person video domain adaptation datasets: ADL$_{small}$ and GTEA-KITCHEN. Secondly, we introduce channel-temporal attention blocks to capture the channel-wise and temporal-wise relationships and model their inter-dependencies important to first-person vision. Finally, we propose a Channel-Temporal Attention Network (CTAN) to integrate these blocks into existing architectures. CTAN outperforms baselines on the two proposed datasets and one existing dataset EPIC$_{cvpr20}$.
Although various image-based domain adaptation (DA) techniques have been proposed in recent years, domain shift in videos is still not well-explored. Most previous works only evaluate performance on small-scale datasets which are saturated. Therefore
Partial Domain Adaptation (PDA) is a practical and general domain adaptation scenario, which relaxes the fully shared label space assumption such that the source label space subsumes the target one. The key challenge of PDA is the issue of negative t
Although various image-based domain adaptation (DA) techniques have been proposed in recent years, domain shift in videos is still not well-explored. Most previous works only evaluate performance on small-scale datasets which are saturated. Therefore
We consider the problem of video-based person re-identification. The goal is to identify a person from videos captured under different cameras. In this paper, we propose an efficient spatial-temporal attention based model for person re-identification
The main progress for action segmentation comes from densely-annotated data for fully-supervised learning. Since manual annotation for frame-level actions is time-consuming and challenging, we propose to exploit auxiliary unlabeled videos, which are