No Arabic abstract
Although Person Re-Identification has made impressive progress, difficult cases like occlusion, change of view-pointand similar clothing still bring great challenges. Besides overall visual features, matching and comparing detailed information is also essential for tackling these challenges. This paper proposes two key recognition patterns to better utilize the detail information of pedestrian images, that most of the existing methods are unable to satisfy. Firstly, Visual Clue Alignment requires the model to select and align decisive regions pairs from two images for pair-wise comparison, while existing methods only align regions with predefined rules like high feature similarity or same semantic labels. Secondly, the Conditional Feature Embedding requires the overall feature of a query image to be dynamically adjusted based on the gallery image it matches, while most of the existing methods ignore the reference images. By introducing novel techniques including correspondence attention module and discrepancy-based GCN, we propose an end-to-end ReID method that integrates both patterns into a unified framework, called CACE-Net((C)lue(A)lignment and (C)onditional (E)mbedding). The experiments show that CACE-Net achieves state-of-the-art performance on three public datasets.
Person reidentification (ReID) is a very hot research topic in machine learning and computer vision, and many person ReID approaches have been proposed; however, most of these methods assume that the same person has the same clothes within a short time interval, and thus their visual appearance must be similar. However, in an actual surveillance environment, a given person has a great probability of changing clothes after a long time span, and they also often take different personal belongings with them. When the existing person ReID methods are applied in this type of case, almost all of them fail. To date, only a few works have focused on the cloth-changing person ReID task, but since it is very difficult to extract generalized and robust features for representing people with different clothes, their performances need to be improved. Moreover, visual-semantic information is often ignored. To solve these issues, in this work, a novel multigranular visual-semantic embedding algorithm (MVSE) is proposed for cloth-changing person ReID, where visual semantic information and human attributes are embedded into the network, and the generalized features of human appearance can be well learned to effectively solve the problem of clothing changes. Specifically, to fully represent a person with clothing changes, a multigranular feature representation scheme (MGR) is employed to focus on the unchanged part of the human, and then a cloth desensitization network (CDN) is designed to improve the feature robustness of the approach for the person with different clothing, where different high-level human attributes are fully utilized. Moreover, to further solve the issue of pose changes and occlusion under different camera perspectives, a partially semantically aligned network (PSA) is proposed to obtain the visual-semantic information that is used to align the human attributes.
Nowadays, deep learning is widely applied to extract features for similarity computation in person re-identification (re-ID) and have achieved great success. However, due to the non-overlapping between training and testing IDs, the difference between the data used for model training and the testing data makes the performance of learned feature degraded during testing. Hence, re-ranking is proposed to mitigate this issue and various algorithms have been developed. However, most of existing re-ranking methods focus on replacing the Euclidean distance with sophisticated distance metrics, which are not friendly to downstream tasks and hard to be used for fast retrieval of massive data in real applications. In this work, we propose a graph-based re-ranking method to improve learned features while still keeping Euclidean distance as the similarity metric. Inspired by graph convolution networks, we develop an operator to propagate features over an appropriate graph. Since graph is the essential key for the propagation, two important criteria are considered for designing the graph, and three different graphs are explored accordingly. Furthermore, a simple yet effective method is proposed to generate a profile vector for each tracklet in videos, which helps extend our method to video re-ID. Extensive experiments on three benchmark data sets, e.g., Market-1501, Duke, and MARS, demonstrate the effectiveness of our proposed approach.
Learning the distance metric between pairs of examples is of great importance for visual recognition, especially for person re-identification (Re-Id). Recently, the contrastive and triplet loss are proposed to enhance the discriminative power of the deeply learned features, and have achieved remarkable success. As can be seen, either the contrastive or triplet loss is just one special case of the Euclidean distance relationships among these training samples. Therefore, we propose a structured graph Laplacian embedding algorithm, which can formulate all these structured distance relationships into the graph Laplacian form. The proposed method can take full advantages of the structured distance relationships among these training samples, with the constructed complete graph. Besides, this formulation makes our method easy-to-implement and super-effective. When embedding the proposed algorithm with the softmax loss for the CNN training, our method can obtain much more robust and discriminative deep features with inter-personal dispersion and intra-personal compactness, which is essential to person Re-Id. We illustrate the effectiveness of our proposed method on top of three popular networks, namely AlexNet, DGDNet and ResNet50, on recent four widely used Re-Id benchmark datasets. Our proposed method achieves state-of-the-art performances.
In this work, we present a deep convolutional pyramid person matching network (PPMN) with specially designed Pyramid Matching Module to address the problem of person re-identification. The architecture takes a pair of RGB images as input, and outputs a similiarity value indicating whether the two input images represent the same person or not. Based on deep convolutional neural networks, our approach first learns the discriminative semantic representation with the semantic-component-aware features for persons and then employs the Pyramid Matching Module to match the common semantic-components of persons, which is robust to the variation of spatial scales and misalignment of locations posed by viewpoint changes. The above two processes are jointly optimized via a unified end-to-end deep learning scheme. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our approach against the state-of-the-art approaches, especially on the rank-1 recognition rate.
Most state-of-the-art person re-identification (re-id) methods depend on supervised model learning with a large set of cross-view identity labelled training data. Even worse, such trained models are limited to only the same-domain deployment with significantly degraded cross-domain generalization capability, i.e. domain specific. To solve this limitation, there are a number of recent unsupervised domain adaptation and unsupervised learning methods that leverage unlabelled target domain training data. However, these methods need to train a separate model for each target domain as supervised learning methods. This conventional {em train once, run once} pattern is unscalable to a large number of target domains typically encountered in real-world deployments. We address this problem by presenting a train once, run everywhere pattern industry-scale systems are desperate for. We formulate a universal model learning approach enabling domain-generic person re-id using only limited training data of a {em single} seed domain. Specifically, we train a universal re-id deep model to discriminate between a set of transformed person identity classes. Each of such classes is formed by applying a variety of random appearance transformations to the images of that class, where the transformations simulate the camera viewing conditions of any domains for making the model training domain generic. Extensive evaluations show the superiority of our method for universal person re-id over a wide variety of state-of-the-art unsupervised domain adaptation and unsupervised learning re-id methods on five standard benchmarks: Market-1501, DukeMTMC, CUHK03, MSMT17, and VIPeR.