ﻻ يوجد ملخص باللغة العربية
Recent years have witnessed a substantial increase in the deep learning (DL)architectures proposed for visual recognition tasks like person re-identification,where individuals must be recognized over multiple distributed cameras. Althoughthese architectures have greatly improved the state-of-the-art accuracy, thecomputational complexity of the CNNs commonly used for feature extractionremains an issue, hindering their deployment on platforms with limited resources,or in applications with real-time constraints. There is an obvious advantage toaccelerating and compressing DL models without significantly decreasing theiraccuracy. However, the source (pruning) domain differs from operational (target)domains, and the domain shift between image data captured with differentnon-overlapping camera viewpoints leads to lower recognition accuracy. In thispaper, we investigate the prunability of these architectures under different designscenarios. This paper first revisits pruning techniques that are suitable forreducing the computational complexity of deep CNN networks applied to personre-identification. Then, these techniques are analysed according to their pruningcriteria and strategy, and according to different scenarios for exploiting pruningmethods to fine-tuning networks to target domains. Experimental resultsobtained using DL models with ResNet feature extractors, and multiplebenchmarks re-identification datasets, indicate that pruning can considerablyreduce network complexity while maintaining a high level of accuracy. Inscenarios where pruning is performed with large pre-training or fine-tuningdatasets, the number of FLOPS required by ResNet architectures is reduced byhalf, while maintaining a comparable rank-1 accuracy (within 1% of the originalmodel). Pruning while training a larger CNNs can also provide a significantlybetter performance than fine-tuning smaller ones.
Style variation has been a major challenge for person re-identification, which aims to match the same pedestrians across different cameras. Existing works attempted to address this problem with camera-invariant descriptor subspace learning. However,
In real-world video surveillance applications, person re-identification (ReID) suffers from the effects of occlusions and detection errors. Despite recent advances, occlusions continue to corrupt the features extracted by state-of-art CNN backbones,
Person re-identification (ReID) aims to match people across multiple non-overlapping video cameras deployed at different locations. To address this challenging problem, many metric learning approaches have been proposed, among which triplet loss is o
Video-based person re-identification has drawn massive attention in recent years due to its extensive applications in video surveillance. While deep learning-based methods have led to significant progress, these methods are limited by ineffectively u
This paper studies the problem of Person Re-Identification (ReID)for large-scale applications. Recent research efforts have been devoted to building complicated part models, which introduce considerably high computational cost and memory consumption,