ترغب بنشر مسار تعليمي؟ اضغط هنا

Attention to Head Locations for Crowd Counting

107   0   0.0 ( 0 )
 نشر من قبل Youmei Zhang
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Occlusions, complex backgrounds, scale variations and non-uniform distributions present great challenges for crowd counting in practical applications. In this paper, we propose a novel method using an attention model to exploit head locations which are the most important cue for crowd counting. The attention model estimates a probability map in which high probabilities indicate locations where heads are likely to be present. The estimated probability map is used to suppress non-head regions in feature maps from several multi-scale feature extraction branches of a convolution neural network for crowd density estimation, which makes our method robust to complex backgrounds, scale variations and non-uniform distributions. In addition, we introduce a relative deviation loss to compensate a commonly used training loss, Euclidean distance, to improve the accuracy of sparse crowd density estimation. Experiments on Shanghai-Tech, UCF_CC_50 and World-Expo10 data sets demonstrate the effectiveness of our method.

قيم البحث

اقرأ أيضاً

In this paper, we address the challenging problem of crowd counting in congested scenes. Specifically, we present Inverse Attention Guided Deep Crowd Counting Network (IA-DCCN) that efficiently infuses segmentation information through an inverse atte ntion mechanism into the counting network, resulting in significant improvements. The proposed method, which is based on VGG-16, is a single-step training framework and is simple to implement. The use of segmentation information results in minimal computational overhead and does not require any additional annotations. We demonstrate the significance of segmentation guided inverse attention through a detailed analysis and ablation study. Furthermore, the proposed method is evaluated on three challenging crowd counting datasets and is shown to achieve significant improvements over several recent methods.
While the performance of crowd counting via deep learning has been improved dramatically in the recent years, it remains an ingrained problem due to cluttered backgrounds and varying scales of people within an image. In this paper, we propose a Shall ow feature based Dense Attention Network (SDANet) for crowd counting from still images, which diminishes the impact of backgrounds via involving a shallow feature based attention model, and meanwhile, captures multi-scale information via densely connecting hierarchical image features. Specifically, inspired by the observation that backgrounds and human crowds generally have noticeably different responses in shallow features, we decide to build our attention model upon shallow-feature maps, which results in accurate background-pixel detection. Moreover, considering that the most representative features of people across different scales can appear in different layers of a feature extraction network, to better keep them all, we propose to densely connect hierarchical image features of different layers and subsequently encode them for estimating crowd density. Experimental results on three benchmark datasets clearly demonstrate the superiority of SDANet when dealing with different scenarios. Particularly, on the challenging UCF CC 50 dataset, our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.
Crowd counting, i.e., estimating the number of people in a crowded area, has attracted much interest in the research community. Although many attempts have been reported, crowd counting remains an open real-world problem due to the vast scale variati ons in crowd density within the interested area, and severe occlusion among the crowd. In this paper, we propose a novel Pyramid Density-Aware Attention-based network, abbreviated as PDANet, that leverages the attention, pyramid scale feature and two branch decoder modules for density-aware crowd counting. The PDANet utilizes these modules to extract different scale features, focus on the relevant information, and suppress the misleading ones. We also address the variation of crowdedness levels among different images with an exclusive Density-Aware Decoder (DAD). For this purpose, a classifier evaluates the density level of the input features and then passes them to the corresponding high and low crowded DAD modules. Finally, we generate an overall density map by considering the summation of low and high crowded density maps as spatial attention. Meanwhile, we employ two losses to create a precise density map for the input scene. Extensive evaluations conducted on the challenging benchmark datasets well demonstrate the superior performance of the proposed PDANet in terms of the accuracy of counting and generated density maps over the well-known state of the arts.
The existing crowd counting methods usually adopted attention mechanism to tackle background noise, or applied multi-level features or multi-scales context fusion to tackle scale variation. However, these approaches deal with these two problems separ ately. In this paper, we propose a Hybrid Attention Network (HAN) by employing Progressive Embedding Scale-context (PES) information, which enables the network to simultaneously suppress noise and adapt head scale variation. We build the hybrid attention mechanism through paralleling spatial attention and channel attention module, which makes the network to focus more on the human head area and reduce the interference of background objects. Besides, we embed certain scale-context to the hybrid attention along the spatial and channel dimensions for alleviating these counting errors caused by the variation of perspective and head scale. Finally, we propose a progressive learning strategy through cascading multiple hybrid attention modules with embedding different scale-context, which can gradually integrate different scale-context information into the current feature map from global to local. Ablation experiments provides that the network architecture can gradually learn multi-scale features and suppress background noise. Extensive experiments demonstrate that HANet obtain state-of-the-art counting performance on four mainstream datasets.
Automated crowd counting from images/videos has attracted more attention in recent years because of its wide application in smart cities. But modelling the dense crowd heads is challenging and most of the existing works become less reliable. To obtai n the appropriate crowd representation, in this work we proposed SOFA-Net(Second-Order and First-order Attention Network): second-order statistics were extracted to retain selectivity of the channel-wise spatial information for dense heads while first-order statistics, which can enhance the feature discrimination for the heads areas, were used as complementary information. Via a multi-stream architecture, the proposed second/first-order statistics were learned and transformed into attention for robust representation refinement. We evaluated our method on four public datasets and the performance reached state-of-the-art on most of them. Extensive experiments were also conducted to study the components in the proposed SOFA-Net, and the results suggested the high-capability of second/first-order statistics on modelling crowd in challenging scenarios. To the best of our knowledge, we are the first work to explore the second/first-order statistics for crowd counting.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا