ترغب بنشر مسار تعليمي؟ اضغط هنا

Mask-Guided Attention Network for Occluded Pedestrian Detection

111   0   0.0 ( 0 )
 نشر من قبل Yanwei Pang
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Pedestrian detection relying on deep convolution neural networks has made significant progress. Though promising results have been achieved on standard pedestrians, the performance on heavily occluded pedestrians remains far from satisfactory. The main culprits are intra-class occlusions involving other pedestrians and inter-class occlusions caused by other objects, such as cars and bicycles. These result in a multitude of occlusion patterns. We propose an approach for occluded pedestrian detection with the following contributions. First, we introduce a novel mask-guided attention network that fits naturally into popular pedestrian detection pipelines. Our attention network emphasizes on visible pedestrian regions while suppressing the occluded ones by modulating full body features. Second, we empirically demonstrate that coarse-level segmentation annotations provide reasonable approximation to their dense pixel-wise counterparts. Experiments are performed on CityPersons and Caltech datasets. Our approach sets a new state-of-the-art on both datasets. Our approach obtains an absolute gain of 9.5% in log-average miss rate, compared to the best reported results on the heavily occluded (HO) pedestrian set of CityPersons test set. Further, on the HO pedestrian set of Caltech dataset, our method achieves an absolute gain of 5.0% in log-average miss rate, compared to the best reported results. Code and models are available at: https://github.com/Leotju/MGAN.



قيم البحث

اقرأ أيضاً

Occlusion is very challenging in pedestrian detection. In this paper, we propose a simple yet effective method named V2F-Net, which explicitly decomposes occluded pedestrian detection into visible region detection and full body estimation. V2F-Net co nsists of two sub-networks: Visible region Detection Network (VDN) and Full body Estimation Network (FEN). VDN tries to localize visible regions and FEN estimates full-body box on the basis of the visible box. Moreover, to further improve the estimation of full body, we propose a novel Embedding-based Part-aware Module (EPM). By supervising the visibility for each part, the network is encouraged to extract features with essential part information. We experimentally show the effectiveness of V2F-Net by conducting several experiments on two challenging datasets. V2F-Net achieves 5.85% AP gains on CrowdHuman and 2.24% MR-2 improvements on CityPersons compared to FPN baseline. Besides, the consistent gain on both one-stage and two-stage detector validates the generalizability of our method.
In this work, we present a novel mask guided attention (MGA) method for fine-grained patchy image classification. The key challenge of fine-grained patchy image classification lies in two folds, ultra-fine-grained inter-category variances among objec ts and very few data available for training. This motivates us to consider employing more useful supervision signal to train a discriminative model within limited training samples. Specifically, the proposed MGA integrates a pre-trained semantic segmentation model that produces auxiliary supervision signal, i.e., patchy attention mask, enabling a discriminative representation learning. The patchy attention mask drives the classifier to filter out the insignificant parts of images (e.g., common features between different categories), which enhances the robustness of MGA for the fine-grained patchy image classification. We verify the effectiveness of our method on three publicly available patchy image datasets. Experimental results demonstrate that our MGA method achieves superior performance on three datasets compared with the state-of-the-art methods. In addition, our ablation study shows that MGA improves the accuracy by 2.25% and 2% on the SoyCultivarVein and BtfPIS datasets, indicating its practicality towards solving the fine-grained patchy image classification.
Detecting pedestrians, especially under heavy occlusions, is a challenging computer vision problem with numerous real-world applications. This paper introduces a novel approach, termed as PSC-Net, for occluded pedestrian detection. The proposed PSC-N et contains a dedicated module that is designed to explicitly capture both inter and intra-part co-occurrence information of different pedestrian body parts through a Graph Convolutional Network (GCN). Both inter and intra-part co-occurrence information contribute towards improving the feature representation for handling varying level of occlusions, ranging from partial to severe occlusions. Our PSC-Net exploits the topological structure of pedestrian and does not require part-based annotations or additional visible bounding-box (VBB) information to learn part spatial co-occurrence. Comprehensive experiments are performed on two challenging datasets: CityPersons and Caltech datasets. The proposed PSC-Net achieves state-of-the-art detection performance on both. On the heavy occluded (textbf{HO}) set of CityPerosns test set, our PSC-Net obtains an absolute gain of 4.0% in terms of log-average miss rate over the state-of-the-art with same backbone, input scale and without using additional VBB supervision. Further, PSC-Net improves the state-of-the-art from 37.9 to 34.8 in terms of log-average miss rate on Caltech (textbf{HO}) test set.
Object detection and counting are related but challenging problems, especially for drone based scenes with small objects and cluttered background. In this paper, we propose a new Guided Attention Network (GANet) to deal with both object detection and counting tasks based on the feature pyramid. Different from the previous methods relying on unsupervised attention modules, we fuse different scales of feature maps by using the proposed weakly-supervised Background Attention (BA) between the background and objects for more semantic feature representation. Then, the Foreground Attention (FA) module is developed to consider both global and local appearance of the object to facilitate accurate localization. Moreover, the new data argumentation strategy is designed to train a robust model in various complex scenes. Extensive experiments on three challenging benchmarks (i.e., UAVDT, CARPK and PUCPR+) show the state-of-the-art detection and counting performance of the proposed method compared with existing methods.
We propose Mask Guided (MG) Matting, a robust matting framework that takes a general coarse mask as guidance. MG Matting leverages a network (PRN) design which encourages the matting model to provide self-guidance to progressively refine the uncertai n regions through the decoding process. A series of guidance mask perturbation operations are also introduced in the training to further enhance its robustness to external guidance. We show that PRN can generalize to unseen types of guidance masks such as trimap and low-quality alpha matte, making it suitable for various application pipelines. In addition, we revisit the foreground color prediction problem for matting and propose a surprisingly simple improvement to address the dataset issue. Evaluation on real and synthetic benchmarks shows that MG Matting achieves state-of-the-art performance using various types of guidance inputs. Code and models are available at https://github.com/yucornetto/MGMatting.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا