ﻻ يوجد ملخص باللغة العربية
Pedestrian detection methods have been significantly improved with the development of deep convolutional neural networks. Nevertheless, detecting small-scaled pedestrians and occluded pedestrians remains a challenging problem. In this paper, we propose a pedestrian detection method with a couple-network to simultaneously address these two issues. One of the sub-networks, the gated multi-layer feature extraction sub-network, aims to adaptively generate discriminative features for pedestrian candidates in order to robustly detect pedestrians with large variations on scales. The second sub-network targets in handling the occlusion problem of pedestrian detection by using deformable regional RoI-pooling. We investigate two different gate units for the gated sub-network, namely, the channel-wise gate unit and the spatio-wise gate unit, which can enhance the representation ability of the regional convolutional features among the channel dimensions or across the spatial domain, repetitively. Ablation studies have validated the effectiveness of both the proposed gated multi-layer feature extraction sub-network and the deformable occlusion handling sub-network. With the coupled framework, our proposed pedestrian detector achieves state-of-the-art results on the Caltech and the CityPersons pedestrian detection benchmarks.
Pedestrian detection methods have been significantly improved with the development of deep convolutional neural networks. Nevertheless, robustly detecting pedestrians with a large variant on sizes and with occlusions remains a challenging problem. In
Panoptic segmentation aims to perform instance segmentation for foreground instances and semantic segmentation for background stuff simultaneously. The typical top-down pipeline concentrates on two key issues: 1) how to effectively model the intrinsi
In this article, we present a very lightweight neural network architecture, trained on stereo data pairs, which performs view synthesis from one single image. With the growing success of multi-view formats, this problem is indeed increasingly relevan
3D human pose estimation (HPE) is crucial in many fields, such as human behavior analysis, augmented reality/virtual reality (AR/VR) applications, and self-driving industry. Videos that contain multiple potentially occluded people captured from freel
The tracking-by-detection framework receives growing attentions through the integration with the Convolutional Neural Networks (CNNs). Existing tracking-by-detection based methods, however, fail to track objects with severe appearance variations. Thi