ترغب بنشر مسار تعليمي؟ اضغط هنا

FTN: Foreground-Guided Texture-Focused Person Re-Identification

176   0   0.0 ( 0 )
 نشر من قبل Shoudong Han
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Person re-identification (Re-ID) is a challenging task as persons are often in different backgrounds. Most recent Re-ID methods treat the foreground and background information equally for person discriminative learning, but can easily lead to potential false alarm problems when different persons are in similar backgrounds or the same person is in different backgrounds. In this paper, we propose a Foreground-Guided Texture-Focused Network (FTN) for Re-ID, which can weaken the representation of unrelated background and highlight the attributes person-related in an end-to-end manner. FTN consists of a semantic encoder (S-Enc) and a compact foreground attention module (CFA) for Re-ID task, and a texture-focused decoder (TF-Dec) for reconstruction task. Particularly, we build a foreground-guided semi-supervised learning strategy for TF-Dec because the reconstructed ground-truths are only the inputs of FTN weighted by the Gaussian mask and the attention mask generated by CFA. Moreover, a new gradient loss is introduced to encourage the network to mine the texture consistency between the inputs and the reconstructed outputs. Our FTN is computationally efficient and extensive experiments on three commonly used datasets Market1501, CUHK03 and MSMT17 demonstrate that the proposed method performs favorably against the state-of-the-art methods.


قيم البحث

اقرأ أيضاً

The performance of person re-identification (Re-ID) has been seriously effected by the large cross-view appearance variations caused by mutual occlusions and background clutters. Hence learning a feature representation that can adaptively emphasize t he foreground persons becomes very critical to solve the person Re-ID problem. In this paper, we propose a simple yet effective foreground attentive neural network (FANN) to learn a discriminative feature representation for person Re-ID, which can adaptively enhance the positive side of foreground and weaken the negative side of background. Specifically, a novel foreground attentive subnetwork is designed to drive the networks attention, in which a decoder network is used to reconstruct the binary mask by using a novel local regression loss function, and an encoder network is regularized by the decoder network to focus its attention on the foreground persons. The resulting feature maps of encoder network are further fed into the body part subnetwork and feature fusion subnetwork to learn discriminative features. Besides, a novel symmetric triplet loss function is introduced to supervise feature learning, in which the intra-class distance is minimized and the inter-class distance is maximized in each triplet unit, simultaneously. Training our FANN in a multi-task learning framework, a discriminative feature representation can be learned to find out the matched reference to each probe among various candidates in the gallery. Extensive experimental results on several public benchmark datasets are evaluated, which have shown clear improvements of our method over the state-of-the-art approaches.
Despite the great progress of person re-identification (ReID) with the adoption of Convolutional Neural Networks, current ReID models are opaque and only outputs a scalar distance between two persons. There are few methods providing users semanticall y understandable explanations for why two persons are the same one or not. In this paper, we propose a post-hoc method, named Attribute-guided Metric Distillation (AMD), to explain existing ReID models. This is the first method to explore attributes to answer: 1) what and where the attributes make two persons different, and 2) how much each attribute contributes to the difference. In AMD, we design a pluggable interpreter network for target models to generate quantitative contributions of attributes and visualize accurate attention maps of the most discriminative attributes. To achieve this goal, we propose a metric distillation loss by which the interpreter learns to decompose the distance of two persons into components of attributes with knowledge distilled from the target model. Moreover, we propose an attribute prior loss to make the interpreter generate attribute-guided attention maps and to eliminate biases caused by the imbalanced distribution of attributes. This loss can guide the interpreter to focus on the exclusive and discriminative attributes rather than the large-area but common attributes of two persons. Comprehensive experiments show that the interpreter can generate effective and intuitive explanations for varied models and generalize well under cross-domain settings. As a by-product, the accuracy of target models can be further improved with our interpreter.
326 - Kuan Zhu , Haiyun Guo , Zhiwei Liu 2020
Existing alignment-based methods have to employ the pretrained human parsing models to achieve the pixel-level alignment, and cannot identify the personal belongings (e.g., backpacks and reticule) which are crucial to person re-ID. In this paper, we propose the identity-guided human semantic parsing approach (ISP) to locate both the human body parts and personal belongings at pixel-level for aligned person re-ID only with person identity labels. We design the cascaded clustering on feature maps to generate the pseudo-labels of human parts. Specifically, for the pixels of all images of a person, we first group them to foreground or background and then group the foreground pixels to human parts. The cluster assignments are subsequently used as pseudo-labels of human parts to supervise the part estimation and ISP iteratively learns the feature maps and groups them. Finally, local features of both human body parts and personal belongings are obtained according to the selflearned part estimation, and only features of visible parts are utilized for the retrieval. Extensive experiments on three widely used datasets validate the superiority of ISP over lots of state-of-the-art methods. Our code is available at https://github.com/CASIA-IVA-Lab/ISP-reID.
Re-identifying a person across multiple disjoint camera views is important for intelligent video surveillance, smart retailing and many other applications. However, existing person re-identification (ReID) methods are challenged by the ubiquitous occ lusion over persons and suffer from performance degradation. This paper proposes a novel occlusion-robust and alignment-free model for occluded person ReID and extends its application to realistic and crowded scenarios. The proposed model first leverages the full convolution network (FCN) and pyramid pooling to extract spatial pyramid features. Then an alignment-free matching approach, namely Foreground-aware Pyramid Reconstruction (FPR), is developed to accurately compute matching scores between occluded persons, despite their different scales and sizes. FPR uses the error from robust reconstruction over spatial pyramid features to measure similarities between two persons. More importantly, we design an occlusion-sensitive foreground probability generator that focuses more on clean human body parts to refine the similarity computation with less contamination from occlusion. The FPR is easily embedded into any end-to-end person ReID models. The effectiveness of the proposed method is clearly demonstrated by the experimental results (Rank-1 accuracy) on three occluded person datasets: Partial REID (78.30%), Partial iLIDS (68.08%) and Occluded REID (81.00%); and three benchmark person datasets: Market1501 (95.42%), DukeMTMC (88.64%) and CUHK03 (76.08%)
Most state-of-the-art person re-identification (re-id) methods depend on supervised model learning with a large set of cross-view identity labelled training data. Even worse, such trained models are limited to only the same-domain deployment with sig nificantly degraded cross-domain generalization capability, i.e. domain specific. To solve this limitation, there are a number of recent unsupervised domain adaptation and unsupervised learning methods that leverage unlabelled target domain training data. However, these methods need to train a separate model for each target domain as supervised learning methods. This conventional {em train once, run once} pattern is unscalable to a large number of target domains typically encountered in real-world deployments. We address this problem by presenting a train once, run everywhere pattern industry-scale systems are desperate for. We formulate a universal model learning approach enabling domain-generic person re-id using only limited training data of a {em single} seed domain. Specifically, we train a universal re-id deep model to discriminate between a set of transformed person identity classes. Each of such classes is formed by applying a variety of random appearance transformations to the images of that class, where the transformations simulate the camera viewing conditions of any domains for making the model training domain generic. Extensive evaluations show the superiority of our method for universal person re-id over a wide variety of state-of-the-art unsupervised domain adaptation and unsupervised learning re-id methods on five standard benchmarks: Market-1501, DukeMTMC, CUHK03, MSMT17, and VIPeR.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا