Point in, Box out: Beyond Counting Persons in Crowds

225 0 0.0 ( 0 )

Download Cite

Added by Yuting Liu

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Yuting Liu - Miaojing Shi - Qijun Zhao

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Modern crowd counting methods usually employ deep neural networks (DNN) to estimate crowd counts via density regression. Despite their significant improvements, the regression-based methods are incapable of providing the detection of individuals in crowds. The detection-based methods, on the other hand, have not been largely explored in recent trends of crowd counting due to the needs for expensive bounding box annotations. In this work, we instead propose a new deep detection network with only point supervision required. It can simultaneously detect the size and location of human heads and count them in crowds. We first mine useful person size information from point-level annotations and initialize the pseudo ground truth bounding boxes. An online updating scheme is introduced to refine the pseudo ground truth during training; while a locally-constrained regression loss is designed to provide additional constraints on the size of the predicted boxes in a local neighborhood. In the end, we propose a curriculum learning strategy to train the network from images of relatively accurate and easy pseudo ground truth first. Extensive experiments are conducted in both detection and counting tasks on several standard benchmarks, e.g. ShanghaiTech, UCF_CC_50, WiderFace, and TRANCOS datasets, and the results show the superiority of our method over the state-of-the-art.

rate research

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

222 - Longyin Wen , Dawei Du , Pengfei Zhu 2021

To promote the developments of object detection, tracking and counting algorithms in drone-captured videos, we construct a benchmark with a new drone-captured largescale dataset, named as DroneCrowd, formed by 112 video clips with 33,600 HD frames in various scenarios. Notably, we annotate 20,800 people trajectories with 4.8 million heads and several video-level attributes. Meanwhile, we design the Space-Time Neighbor-Aware Network (STNNet) as a strong baseline to solve object detection, tracking and counting jointly in dense crowds. STNNet is formed by the feature extraction module, followed by the density map estimation heads, and localization and association subnets. To exploit the context information of neighboring objects, we design the neighboring context loss to guide the association subnet training, which enforces consistent relative position of nearby objects in temporal domain. Extensive experiments on our DroneCrowd dataset demonstrate that STNNet performs favorably against the state-of-the-arts.

Computer Vision and Pattern Recognition

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild

186 - Debidatta Dwibedi , Yusuf Aytar , Jonathan Tompson 2020

We present an approach for estimating the period with which an action is repeated in a video. The crux of the approach lies in constraining the period prediction module to use temporal self-similarity as an intermediate representation bottleneck that allows generalization to unseen repetitions in videos in the wild. We train this model, called Repnet, with a synthetic dataset that is generated from a large unlabeled video collection by sampling short clips of varying lengths and repeating them with different periods and counts. This combination of synthetic data and a powerful yet constrained model, allows us to predict periods in a class-agnostic fashion. Our model substantially exceeds the state of the art performance on existing periodicity (PERTUBE) and repetition counting (QUVA) benchmarks. We also collect a new challenging dataset called Countix (~90 times larger than existing datasets) which captures the challenges of repetition counting in real-world videos. Project webpage: https://sites.google.com/view/repnet .

Computer Vision and Pattern Recognition

Dense Crowds Detection and Counting with a Lightweight Architecture

94 - Javier Antonio Gonzalez-Trejo , Diego Alberto Mercado-Ravell 2020

In the context of crowd counting, most of the works have focused on improving the accuracy without regard to the performance leading to algorithms that are not suitable for embedded applications. In this paper, we propose a lightweight convolutional neural network architecture to perform crowd detection and counting using fewer computer resources without a significant loss on count accuracy. The architecture was trained using the Bayes loss function to further improve its accuracy and then pruned to further reduce the computational resources used. The proposed architecture was tested over the USF-QNRF achieving a competitive Mean Average Error of 154.07 and a superior Mean Square Error of 241.77 while maintaining a competitive number of parameters of 0.067 Million. The obtained results suggest that the Bayes loss can be used with other architectures to further improve them and also the last convolutional layer provides no significant information and even encourage over-fitting at training.

Computer Vision and Pattern Recognition Neural and Evolutionary Computing

Efficient Pig Counting in Crowds with Keypoints Tracking and Spatial-aware Temporal Response Filtering

324 - Guang Chen , Shiwen Shen , Longyin Wen 2020

Pig counting is a crucial task for large-scale pig farming, which is usually completed by human visually. But this process is very time-consuming and error-prone. Few studies in literature developed automated pig counting method. Existing methods only focused on pig counting using single image, and its accuracy is challenged by several factors, including pig movements, occlusion and overlapping. Especially, the field of view of a single image is very limited, and could not meet the requirements of pig counting for large pig grouping houses. To that end, we presented a real-time automated pig counting system in crowds using only one monocular fisheye camera with an inspection robot. Our system showed that it produces accurate results surpassing human. Our pipeline began with a novel bottom-up pig detection algorithm to avoid false negatives due to overlapping, occlusion and deformation of pigs. A deep convolution neural network (CNN) is designed to detect keypoints of pig body part and associate the keypoints to identify individual pigs. After that, an efficient on-line tracking method is used to associate pigs across video frames. Finally, a novel spatial-aware temporal response filtering (STRF) method is proposed to predict the counts of pigs, which is effective to suppress false positives caused by pig or camera movements or tracking failures. The whole pipeline has been deployed in an edge computing device, and demonstrated the effectiveness.

Computer Vision and Pattern Recognition Robotics Image and Video Processing

Beyond Farthest Point Sampling in Point-Wise Analysis

98 - Yiqun Lin , Lichang Chen , Haibin Huang 2021

Sampling, grouping, and aggregation are three important components in the multi-scale analysis of point clouds. In this paper, we present a novel data-driven sampler learning strategy for point-wise analysis tasks. Unlike the widely used sampling technique, Farthest Point Sampling (FPS), we propose to learn sampling and downstream applications jointly. Our key insight is that uniform sampling methods like FPS are not always optimal for different tasks: sampling more points around boundary areas can make the point-wise classification easier for segmentation. Towards the end, we propose a novel sampler learning strategy that learns sampling point displacement supervised by task-related ground truth information and can be trained jointly with the underlying tasks. We further demonstrate our methods in various point-wise analysis architectures, including semantic part segmentation, point cloud completion, and keypoint detection. Our experiments show that jointly learning of the sampler and task brings remarkable improvement over previous baseline methods.

Computer Vision and Pattern Recognition