No Arabic abstract
RGB-Infrared (IR) cross-modality person re-identification (re-ID), which aims to search an IR image in RGB gallery or vice versa, is a challenging task due to the large discrepancy between IR and RGB modalities. Existing methods address this challenge typically by aligning feature distributions or image styles across modalities, whereas the very useful similarities among gallery samples of the same modality (i.e. intra-modality sample similarities) is largely neglected. This paper presents a novel similarity inference metric (SIM) that exploits the intra-modality sample similarities to circumvent the cross-modality discrepancy targeting optimal cross-modality image matching. SIM works by successive similarity graph reasoning and mutual nearest-neighbor reasoning that mine cross-modality sample similarities by leveraging intra-modality sample similarities from two different perspectives. Extensive experiments over two cross-modality re-ID datasets (SYSU-MM01 and RegDB) show that SIM achieves significant accuracy improvement but with little extra training as compared with the state-of-the-art.
RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. The key solution is to learn aligned features to the bridge RGB and IR modalities. However, due to the lack of correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing the distance between the entire RGB and IR sets. However, this set-level alignment may lead to misalignment of some instances, which limits the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged images. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Extensive experimental results on two standard benchmarks demonstrate that the proposed model favourably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.
RGB-infrared person re-identification is a challenging task due to the intra-class variations and cross-modality discrepancy. Existing works mainly focus on learning modality-shared global representations by aligning image styles or feature distributions across modalities, while local feature from body part and relationships between person images are largely neglected. In this paper, we propose a Dual-level (i.e., local and global) Feature Fusion (DF^2) module by learning attention for discriminative feature from local to global manner. In particular, the attention for a local feature is determined locally, i.e., applying a learned transformation function on itself. Meanwhile, to further mining the relationships between global features from person images, we propose an Affinities Modeling (AM) module to obtain the optimal intra- and inter-modality image matching. Specifically, AM employes intra-class compactness and inter-class separability in the sample similarities as supervised information to model the affinities between intra- and inter-modality samples. Experimental results show that our proposed method outperforms state-of-the-arts by large margins on two widely used cross-modality re-ID datasets SYSU-MM01 and RegDB, respectively.
This paper pays close attention to the cross-modality visible-infrared person re-identification (VI Re-ID) task, which aims to match human samples between visible and infrared modes. In order to reduce the discrepancy between features of different modalities, most existing works usually use constraints based on Euclidean metric. Since the Euclidean based distance metric cannot effectively measure the internal angles between the embedded vectors, the above methods cannot learn the angularly discriminative feature embedding. Because the most important factor affecting the classification task based on embedding vector is whether there is an angularly discriminativ feature space, in this paper, we propose a new loss function called Enumerate Angular Triplet (EAT) loss. Also, motivated by the knowledge distillation, to narrow down the features between different modalities before feature embedding, we further present a new Cross-Modality Knowledge Distillation (CMKD) loss. The experimental results on RegDB and SYSU-MM01 datasets have shown that the proposed method is superior to the other most advanced methods in terms of impressive performance.
RGB-Infrared (IR) person re-identification aims to retrieve person-of-interest from heterogeneous cameras, easily suffering from large image modality discrepancy caused by different sensing wavelength ranges. Existing work usually minimizes such discrepancy by aligning domain distribution of global features, while neglecting the intra-modality structural relations between semantic parts. This could result in the network overly focusing on local cues, without considering long-range body part dependencies, leading to meaningless region representations. In this paper, we propose a graph-enabled distribution matching solution, dubbed Geometry-Guided Dual-Alignment (G2DA) learning, for RGB-IR ReID. It can jointly encourage the cross-modal consistency between part semantics and structural relations for fine-grained modality alignment by solving a graph matching task within a multi-scale skeleton graph that embeds human topology information. Specifically, we propose to build a semantic-aligned complete graph into which all cross-modality images can be mapped via a pose-adaptive graph construction mechanism. This graph represents extracted whole-part features by nodes and expresses the node-wise similarities with associated edges. To achieve the graph-based dual-alignment learning, an Optimal Transport (OT) based structured metric is further introduced to simultaneously measure point-wise relations and group-wise structural similarities across modalities. By minimizing the cost of an inter-modality transport plan, G2DA can learn a consistent and discriminative feature subspace for cross-modality image retrieval. Furthermore, we advance a Message Fusion Attention (MFA) mechanism to adaptively reweight the information flow of semantic propagation, effectively strengthening the discriminability of extracted semantic features.
Person re-identification (Re-ID) aims to match the image frames which contain the same person in the surveillance videos. Most of the Re-ID algorithms conduct supervised training in some small labeled datasets, so directly deploying these trained models to the real-world large camera networks may lead to a poor performance due to underfitting. The significant difference between the source training dataset and the target testing dataset makes it challenging to incrementally optimize the model. To address this challenge, we propose a novel solution by transforming the unlabeled images in the target domain to fit the original classifier by using our proposed similarity preserved generative adversarial networks model, SimPGAN. Specifically, SimPGAN adopts the generative adversarial networks with the cycle consistency constraint to transform the unlabeled images in the target domain to the style of the source domain. Meanwhile, SimPGAN uses the similarity consistency loss, which is measured by a siamese deep convolutional neural network, to preserve the similarity of the transformed images of the same person. Comprehensive experiments based on multiple real surveillance datasets are conducted, and the results show that our algorithm is better than the state-of-the-art cross-dataset unsupervised person Re-ID algorithms.