ﻻ يوجد ملخص باللغة العربية
In person re-identification, extracting part-level features from person images has been verified to be crucial. Most of existing CNN-based methods only locate the human parts coarsely, or rely on pre-trained human parsing models and fail in locating the identifiable non-human parts (e.g., knapsack). In this paper, we introduce an alignment scheme in Transformer architecture for the first time and propose the Auto-Aligned Transformer (AAformer) to automatically locate both the human parts and non-human ones at patch-level. We introduce the part tokens, which are learnable vectors, to extract part features in Transformer. A part token only interacts with a local subset of patches in self-attention and learns to be the part representation. To adaptively group the image patches into different subsets, we design the Auto-Alignment. Auto-Alignment employs a fast variant of Optimal Transport algorithm to online cluster the patch embeddings into several groups with the part tokens as their prototypes. We harmoniously integrate the part alignment into the self-attention and the output part tokens can be directly used for retrieval. Extensive experiments validate the effectiveness of part tokens and the superiority of AAformer over various state-of-the-art methods.
We propose a densely semantically aligned person re-identification framework. It fundamentally addresses the body misalignment problem caused by pose/viewpoint variations, imperfect person detection, occlusion, etc. By leveraging the estimation of th
Recently, the Transformer module has been transplanted from natural language processing to computer vision. This paper applies the Transformer to video-based person re-identification, where the key issue is to extract the discriminative information f
Recently, video-based person re-identification (re-ID) has drawn increasing attention in compute vision community because of its practical application prospects. Due to the inaccurate person detections and pose changes, pedestrian misalignment signif
Person re-identification (re-ID) under various occlusions has been a long-standing challenge as person images with different types of occlusions often suffer from misalignment in image matching and ranking. Most existing methods tackle this challenge
In this work, we present a deep convolutional pyramid person matching network (PPMN) with specially designed Pyramid Matching Module to address the problem of person re-identification. The architecture takes a pair of RGB images as input, and outputs