ﻻ يوجد ملخص باللغة العربية
Traditional crowd counting approaches usually use Gaussian assumption to generate pseudo density ground truth, which suffers from problems like inaccurate estimation of the Gaussian kernel sizes. In this paper, we propose a new measure-based counting approach to regress the predicted density maps to the scattered point-annotated ground truth directly. First, crowd counting is formulated as a measure matching problem. Second, we derive a semi-balanced form of Sinkhorn divergence, based on which a Sinkhorn counting loss is designed for measure matching. Third, we propose a self-supervised mechanism by devising a Sinkhorn scale consistency loss to resist scale changes. Finally, an efficient optimization method is provided to minimize the overall loss function. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++, and NWPU have validated the proposed method.
In crowd counting, each training image contains multiple people, where each person is annotated by a dot. Existing crowd counting methods need to use a Gaussian to smooth each annotated dot or to estimate the likelihood of every pixel given the annot
State-of-the-art methods for counting people in crowded scenes rely on deep networks to estimate crowd density. They typically use the same filters over the whole image or over large image patches. Only then do they estimate local scale to compensate
In this paper, we propose a novel perspective-guided convolution (PGC) for convolutional neural network (CNN) based crowd counting (i.e. PGCNet), which aims to overcome the dramatic intra-scene scale variations of people due to the perspective effect
Occlusions, complex backgrounds, scale variations and non-uniform distributions present great challenges for crowd counting in practical applications. In this paper, we propose a novel method using an attention model to exploit head locations which a
Crowd counting is a challenging task due to the issues such as scale variation and perspective variation in real crowd scenes. In this paper, we propose a novel Cascaded Residual Density Network (CRDNet) in a coarse-to-fine approach to generate the h