ترغب بنشر مسار تعليمي؟ اضغط هنا

Crowd Counting with Deep Structured Scale Integration Network

129   0   0.0 ( 0 )
 نشر من قبل Lingbo Liu
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Automatic estimation of the number of people in unconstrained crowded scenes is a challenging task and one major difficulty stems from the huge scale variation of people. In this paper, we propose a novel Deep Structured Scale Integration Network (DSSINet) for crowd counting, which addresses the scale variation of people by using structured feature representation learning and hierarchically structured loss function optimization. Unlike conventional methods which directly fuse multiple features with weighted average or concatenation, we first introduce a Structured Feature Enhancement Module based on conditional random fields (CRFs) to refine multiscale features mutually with a message passing mechanism. In this module, each scale-specific feature is considered as a continuous random variable and passes complementary information to refine the features at other scales. Second, we utilize a Dilated Multiscale Structural Similarity loss to enforce our DSSINet to learn the local correlation of peoples scales within regions of various size, thus yielding high-quality density maps. Extensive experiments on four challenging benchmarks well demonstrate the effectiveness of our method. Specifically, our DSSINet achieves improvements of 9.5% error reduction on Shanghaitech dataset and 24.9% on UCF-QNRF dataset against the state-of-the-art methods.



قيم البحث

اقرأ أيضاً

In this paper, we address the challenging problem of crowd counting in congested scenes. Specifically, we present Inverse Attention Guided Deep Crowd Counting Network (IA-DCCN) that efficiently infuses segmentation information through an inverse atte ntion mechanism into the counting network, resulting in significant improvements. The proposed method, which is based on VGG-16, is a single-step training framework and is simple to implement. The use of segmentation information results in minimal computational overhead and does not require any additional annotations. We demonstrate the significance of segmentation guided inverse attention through a detailed analysis and ablation study. Furthermore, the proposed method is evaluated on three challenging crowd counting datasets and is shown to achieve significant improvements over several recent methods.
126 - Xiaowen Shi , Xin Li , Caili Wu 2020
Automatic analysis of highly crowded people has attracted extensive attention from computer vision research. Previous approaches for crowd counting have already achieved promising performance across various benchmarks. However, to deal with the real situation, we hope the model run as fast as possible while keeping accuracy. In this paper, we propose a compact convolutional neural network for crowd counting which learns a more efficient model with a small number of parameters. With three parallel filters executing the convolutional operation on the input image simultaneously at the front of the network, our model could achieve nearly real-time speed and save more computing resources. Experiments on two benchmarks show that our proposed method not only takes a balance between performance and efficiency which is more suitable for actual scenes but also is superior to existing light-weight models in speed.
Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the cameras perspective that causes huge appearance variations in peoples scales and rotations. Conventional methods address such challenges by resorting to fixed multi-scale architectures that are often unable to cover the largely varied scales while ignoring the rotation variations. In this paper, we propose a unified neural network framework, named Deep Recurrent Spatial-Aware Network, which adaptively addresses the two issues in a learnable spatial transform module with a region-wise refinement process. Specifically, our framework incorporates a Recurrent Spatial-Aware Refinement (RSAR) module iteratively conducting two components: i) a Spatial Transformer Network that dynamically locates an attentional region from the crowd density map and transforms it to the suitable scale and rotation for optimal crowd estimation; ii) a Local Refinement Network that refines the density map of the attended region with residual learning. Extensive experiments on four challenging benchmarks show the effectiveness of our approach. Specifically, comparing with the existing best-performing methods, we achieve an improvement of 12% on the largest dataset WorldExpo10 and 22.8% on the most challenging dataset UCF_CC_50.
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. However, most previous works relied on heavy backbone networks and required prohibitive run-time consumption, which would seriously re strict their deployment scopes and cause poor scalability. To liberate these crowd counting models, we propose a novel Structured Knowledge Transfer (SKT) framework, which fully exploits the structured knowledge of a well-trained teacher network to generate a lightweight but still highly effective student network. Specifically, it is integrated with two complementary transfer modules, including an Intra-Layer Pattern Transfer which sequentially distills the knowledge embedded in layer-wise features of the teacher network to guide feature learning of the student network and an Inter-Layer Relation Transfer which densely distills the cross-layer correlation knowledge of the teacher to regularize the students feature evolutio Consequently, our student network can derive the layer-wise and cross-layer knowledge from the teacher network to learn compact yet effective features. Extensive evaluations on three benchmarks well demonstrate the effectiveness of our SKT for extensive crowd counting models. In particular, only using around $6%$ of the parameters and computation cost of original models, our distilled VGG-based models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance. Our code and models are available at {url{https://github.com/HCPLab-SYSU/SKT}}.
The crowd counting task aims at estimating the number of people located in an image or a frame from videos. Existing methods widely adopt density maps as the training targets to optimize the point-to-point loss. While in testing phase, we only focus on the differences between the crowd numbers and the global summation of density maps, which indicate the inconsistency between the training targets and the evaluation criteria. To solve this problem, we introduce a new target, named local counting map (LCM), to obtain more accurate results than density map based approaches. Moreover, we also propose an adaptive mixture regression framework with three modules in a coarse-to-fine manner to further improve the precision of the crowd estimation: scale-aware module (SAM), mixture regression module (MRM) and adaptive soft interval module (ASIM). Specifically, SAM fully utilizes the context and multi-scale information from different convolutional features; MRM and ASIM perform more precise counting regression on local patches of images. Compared with current methods, the proposed method reports better performances on the typical datasets. The source code is available at https://github.com/xiyang1012/Local-Crowd-Counting.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا