ترغب بنشر مسار تعليمي؟ اضغط هنا

FlipReID: Closing the Gap between Training and Inference in Person Re-Identification

109   0   0.0 ( 0 )
 نشر من قبل Xingyang Ni
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Since neural networks are data-hungry, incorporating data augmentation in training is a widely adopted technique that enlarges datasets and improves generalization. On the other hand, aggregating predictions of multiple augmented samples (i.e., test-time augmentation) could boost performance even further. In the context of person re-identification models, it is common practice to extract embeddings for both the original images and their horizontally flipped variants. The final representation is the mean of the aforementioned feature vectors. However, such scheme results in a gap between training and inference, i.e., the mean feature vectors calculated in inference are not part of the training pipeline. In this study, we devise the FlipReID structure with the flipping loss to address this issue. More specifically, models using the FlipReID structure are trained on the original images and the flipped images simultaneously, and incorporating the flipping loss minimizes the mean squared error between feature vectors of corresponding image pairs. Extensive experiments show that our method brings consistent improvements. In particular, we set a new record for MSMT17 which is the largest person re-identification dataset. The source code is available at https://github.com/nixingyang/FlipReID.



قيم البحث

اقرأ أيضاً

Person re-identification (ReID) aims at finding the same person in different cameras. Training such systems usually requires a large amount of cross-camera pedestrians to be annotated from surveillance videos, which is labor-consuming especially when the number of cameras is large. Differently, this paper investigates ReID in an unexplored single-camera-training (SCT) setting, where each person in the training set appears in only one camera. To the best of our knowledge, this setting was never studied before. SCT enjoys the advantage of low-cost data collection and annotation, and thus eases ReID systems to be trained in a brand new environment. However, it raises major challenges due to the lack of cross-camera person occurrences, which conventional approaches heavily rely on to extract discriminative features. The key to dealing with the challenges in the SCT setting lies in designing an effective mechanism to complement cross-camera annotation. We start with a regular deep network for feature extraction, upon which we propose a novel loss function named multi-camera negative loss (MCNL). This is a metric learning loss motivated by probability, suggesting that in a multi-camera system, one image is more likely to be closer to the most similar negative sample in other cameras than to the most similar negative sample in the same camera. In experiments, MCNL significantly boosts ReID accuracy in the SCT setting, which paves the way of fast deployment of ReID systems with good performance on new target scenes.
In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset LUPerson and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature repres entation. This is to address the problem that all existing person Re-ID datasets are all of limited scale due to the costly effort required for data annotation. Previous research tries to leverage models pre-trained on ImageNet to mitigate the shortage of person Re-ID data but suffers from the large domain gap between ImageNet and person Re-ID data. LUPerson is an unlabeled dataset of 4M images of over 200K identities, which is 30X larger than the largest existing Re-ID dataset. It also covers a much diverse range of capturing environments (eg, camera settings, scenes, etc.). Based on this dataset, we systematically study the key factors for learning Re-ID features from two perspectives: data augmentation and contrastive loss. Unsupervised pre-training performed on this large-scale dataset effectively leads to a generic Re-ID feature that can benefit all existing person Re-ID methods. Using our pre-trained model in some basic frameworks, our methods achieve state-of-the-art results without bells and whistles on four widely used Re-ID datasets: CUHK03, Market1501, DukeMTMC, and MSMT17. Our results also show that the performance improvement is more significant on small-scale target datasets or under few-shot setting.
Inspired by the effectiveness of adversarial training in the area of Generative Adversarial Networks we present a new approach for learning feature representations in person re-identification. We investigate different types of bias that typically occ ur in re-ID scenarios, i.e., pose, body part and camera view, and propose a general approach to address them. We introduce an adversarial strategy for controlling bias, named Bias-controlled Adversarial framework (BCA), with two complementary branches to reduce or to enhance bias-related features. The results and comparison to the state of the art on different benchmarks show that our framework is an effective strategy for person re-identification. The performance improvements are in both full and partial views of persons.
88 - Tianyang Liu , Yutian Lin , Bo Du 2021
Unsupervised person re-identification (re-ID) has attracted increasing research interests because of its scalability and possibility for real-world applications. State-of-the-art unsupervised re-ID methods usually follow a clustering-based strategy, which generates pseudo labels by clustering and maintains a memory to store instance features and represent the centroid of the clusters for contrastive learning. This approach suffers two problems. First, the centroid generated by unsupervised learning may not be a perfect prototype. Forcing images to get closer to the centroid emphasizes the result of clustering, which could accumulate clustering errors during iterations. Second, previous methods utilize features obtained at different training iterations to represent one centroid, which is not consistent with the current training sample, since the features are not directly comparable. To this end, we propose an unsupervised re-ID approach with a stochastic learning strategy. Specifically, we adopt a stochastic updated memory, where a random instance from a cluster is used to update the cluster-level memory for contrastive learning. In this way, the relationship between randomly selected pair of images are learned to avoid the training bias caused by unreliable pseudo labels. The stochastic memory is also always up-to-date for classifying to keep the consistency. Besides, to relieve the issue of camera variance, a unified distance matrix is proposed during clustering, where the distance bias from different camera domain is reduced and the variances of identities is emphasized.
Person re-identification (re-ID) is a very active area of research in computer vision, due to the role it plays in video surveillance. Currently, most methods only address the task of matching between colour images. However, in poorly-lit environment s CCTV cameras switch to infrared imaging, hence developing a system which can correctly perform matching between infrared and colour images is a necessity. In this paper, we propose a part-feature extraction network to better focus on subtle, unique signatures on the person which are visible across both infrared and colour modalities. To train the model we propose a novel variant of the domain adversarial feature-learning framework. Through extensive experimentation, we show that our approach outperforms state-of-the-art methods.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا