Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Dense Crowds Detection and Counting with a Lightweight Architecture

95 0 0.0 ( 0 )

Download Cite

Added by Javier Gonzalez-Trejo

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Javier Antonio Gonzalez-Trejo - Diego Alberto Mercado-Ravell

Computer Vision and Pattern Recognition Neural and Evolutionary Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In the context of crowd counting, most of the works have focused on improving the accuracy without regard to the performance leading to algorithms that are not suitable for embedded applications. In this paper, we propose a lightweight convolutional neural network architecture to perform crowd detection and counting using fewer computer resources without a significant loss on count accuracy. The architecture was trained using the Bayes loss function to further improve its accuracy and then pruned to further reduce the computational resources used. The proposed architecture was tested over the USF-QNRF achieving a competitive Mean Average Error of 154.07 and a superior Mean Square Error of 241.77 while maintaining a competitive number of parameters of 0.067 Million. The obtained results suggest that the Bayes loss can be used with other architectures to further improve them and also the last convolutional layer provides no significant information and even encourage over-fitting at training.

rate research

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

222 - Longyin Wen , Dawei Du , Pengfei Zhu 2021

To promote the developments of object detection, tracking and counting algorithms in drone-captured videos, we construct a benchmark with a new drone-captured largescale dataset, named as DroneCrowd, formed by 112 video clips with 33,600 HD frames in various scenarios. Notably, we annotate 20,800 people trajectories with 4.8 million heads and several video-level attributes. Meanwhile, we design the Space-Time Neighbor-Aware Network (STNNet) as a strong baseline to solve object detection, tracking and counting jointly in dense crowds. STNNet is formed by the feature extraction module, followed by the density map estimation heads, and localization and association subnets. To exploit the context information of neighboring objects, we design the neighboring context loss to guide the association subnet training, which enforces consistent relative position of nearby objects in temporal domain. Extensive experiments on our DroneCrowd dataset demonstrate that STNNet performs favorably against the state-of-the-arts.

Computer Vision and Pattern Recognition

DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation

169 - Jiang Liu , Chenqiang Gao , Deyu Meng 2017

In real-world crowd counting applications, the crowd densities vary greatly in spatial and temporal domains. A detection based counting method will estimate crowds accurately in low density scenes, while its reliability in congested areas is downgraded. A regression based approach, on the other hand, captures the general density information in crowded regions. Without knowing the location of each person, it tends to overestimate the count in low density areas. Thus, exclusively using either one of them is not sufficient to handle all kinds of scenes with varying densities. To address this issue, a novel end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density Estimation Network) is proposed. It can adaptively decide the appropriate counting mode for different locations on the image based on its real density conditions. DecideNet starts with estimating the crowd density by generating detection and regression based density maps separately. To capture inevitable variation in densities, it incorporates an attention module, meant to adaptively assess the reliability of the two types of estimations. The final crowd counts are obtained with the guidance of the attention module to adopt suitable estimations from the two kinds of density maps. Experimental results show that our method achieves state-of-the-art performance on three challenging crowd counting datasets.

Computer Vision and Pattern Recognition

CarNet: A Lightweight and Efficient Encoder-Decoder Architecture for High-quality Road Crack Detection

73 - Kai Li , Yingjie Tian , 2021

Pixel-wise crack detection is a challenging task because of poor continuity and low contrast in cracks. The existing frameworks usually employ complex models leading to good accuracy and yet low inference efficiency. In this paper, we present a lightweight encoder-decoder architecture, CarNet, for efficient and high-quality crack detection. To this end, we first propose that the ideal encoder should present an olive-type distribution about the number of convolutional layers at different stages. Specifically, as the network stages deepen in the encoder, the number of convolutional layers shows a downward trend after the model input is compressed in the initial network stage. Meanwhile, in the decoder, we introduce a lightweight up-sampling feature pyramid module to learn rich hierarchical features for crack detection. In particular, we compress the feature maps of the last three network stages to the same channels and then employ up-sampling with different multiples to resize them to the same resolutions for information fusion. Finally, extensive experiments on four public databases, i.e., Sun520, Rain365, BJN260, and Crack360, demonstrate that our CarNet gains a good trade-off between inference efficiency and test accuracy over the existing state-of-the-art methods.

Computer Vision and Pattern Recognition

Lightweight Monocular Depth with a Novel Neural Architecture Search Method

124 - Lam Huynh , Phong Nguyen , Jiri Matas 2021

This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models. Unlike previous neural architecture search (NAS) approaches, where finding optimized networks are computationally highly demanding, the introduced novel Assisted Tabu Search leads to efficient architecture exploration. Moreover, we construct the search space on a pre-defined backbone network to balance layer diversity and search space size. The LiDNAS method outperforms the state-of-the-art NAS approach, proposed for disparity and depth estimation, in terms of search efficiency and output model performance. The LiDNAS optimized models achieve results superior to compact depth estimation state-of-the-art on NYU-Depth-v2, KITTI, and ScanNet, while being 7%-500% more compact in size, i.e the number of model parameters.

Computer Vision and Pattern Recognition

Efficient Pig Counting in Crowds with Keypoints Tracking and Spatial-aware Temporal Response Filtering

324 - Guang Chen , Shiwen Shen , Longyin Wen 2020

Pig counting is a crucial task for large-scale pig farming, which is usually completed by human visually. But this process is very time-consuming and error-prone. Few studies in literature developed automated pig counting method. Existing methods only focused on pig counting using single image, and its accuracy is challenged by several factors, including pig movements, occlusion and overlapping. Especially, the field of view of a single image is very limited, and could not meet the requirements of pig counting for large pig grouping houses. To that end, we presented a real-time automated pig counting system in crowds using only one monocular fisheye camera with an inspection robot. Our system showed that it produces accurate results surpassing human. Our pipeline began with a novel bottom-up pig detection algorithm to avoid false negatives due to overlapping, occlusion and deformation of pigs. A deep convolution neural network (CNN) is designed to detect keypoints of pig body part and associate the keypoints to identify individual pigs. After that, an efficient on-line tracking method is used to associate pigs across video frames. Finally, a novel spatial-aware temporal response filtering (STRF) method is proposed to predict the counts of pigs, which is effective to suppress false positives caused by pig or camera movements or tracking failures. The whole pipeline has been deployed in an edge computing device, and demonstrated the effectiveness.

Computer Vision and Pattern Recognition Robotics Image and Video Processing

comments

Fetching comments

Tishreen University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Dense Crowds Detection and Counting with a Lightweight Architecture

Ask ChatGPT about the research

No Arabic abstract

Read More