Which to Match? Selecting Consistent GT-Proposal Assignment for Pedestrian Detection

94 0 0.0 ( 0 )

Download Cite

Added by Yan Luo

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yan Luo - Chongyang Zhang - Muming Zhao

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Accurate pedestrian classification and localization have received considerable attention due to their wide applications such as security monitoring, autonomous driving, etc. Although pedestrian detectors have made great progress in recent years, the fixed Intersection over Union (IoU) based assignment-regression manner still limits their performance. Two main factors are responsible for this: 1) the IoU threshold faces a dilemma that a lower one will result in more false positives, while a higher one will filter out the matched positives; 2) the IoU-based GT-Proposal assignment suffers from the inconsistent supervision problem that spatially adjacent proposals with similar features are assigned to different ground-truth boxes, which means some very similar proposals may be forced to regress towards different targets, and thus confuses the bounding-box regression when predicting the location results. In this paper, we first put forward the question that textbf{Regression Direction} would affect the performance for pedestrian detection. Consequently, we address the weakness of IoU by introducing one geometric sensitive search algorithm as a new assignment and regression metric. Different from the previous IoU-based textbf{one-to-one} assignment manner of one proposal to one ground-truth box, the proposed method attempts to seek a reasonable matching between the sets of proposals and ground-truth boxes. Specifically, we boost the MR-FPPI under R$_{75}$ by 8.8% on Citypersons dataset. Furthermore, by incorporating this method as a metric into the state-of-the-art pedestrian detectors, we show a consistent improvement.

rate research

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection

95 - Zheng Ge , Jianfeng Wang , Xin Huang 2021

Label assignment has been widely studied in general object detection because of its great impact on detectors performance. However, none of these works focus on label assignment in dense pedestrian detection. In this paper, we propose a simple yet effective assigning strategy called Loss-aware Label Assignment (LLA) to boost the performance of pedestrian detectors in crowd scenarios. LLA first calculates classification (cls) and regression (reg) losses between each anchor and ground-truth (GT) pair. A joint loss is then defined as the weighted summation of cls and reg losses as the assigning indicator. Finally, anchors with top K minimum joint losses for a certain GT box are assigned as its positive anchors. Anchors that are not assigned to any GT box are considered negative. Loss-aware label assignment is based on an observation that anchors with lower joint loss usually contain richer semantic information and thus can better represent their corresponding GT boxes. Experiments on CrowdHuman and CityPersons show that such a simple label assigning strategy can boost MR by 9.53% and 5.47% on two famous one-stage detectors - RetinaNet and FCOS, respectively, demonstrating the effectiveness of LLA.

Computer Vision and Pattern Recognition

NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing

92 - Xin Huang , Zheng Ge , Zequn Jie 2020

Although significant progress has been made in pedestrian detection recently, pedestrian detection in crowded scenes is still challenging. The heavy occlusion between pedestrians imposes great challenges to the standard Non-Maximum Suppression (NMS). A relative low threshold of intersection over union (IoU) leads to missing highly overlapped pedestrians, while a higher one brings in plenty of false positives. To avoid such a dilemma, this paper proposes a novel Representative Region NMS approach leveraging the less occluded visible parts, effectively removing the redundant boxes without bringing in many false positives. To acquire the visible parts, a novel Paired-Box Model (PBM) is proposed to simultaneously predict the full and visible boxes of a pedestrian. The full and visible boxes constitute a pair serving as the sample unit of the model, thus guaranteeing a strong correspondence between the two boxes throughout the detection pipeline. Moreover, convenient feature integration of the two boxes is allowed for the better performance on both full and visible pedestrian detection tasks. Experiments on the challenging CrowdHuman and CityPersons benchmarks sufficiently validate the effectiveness of the proposed approach on pedestrian detection in the crowded situation.

Computer Vision and Pattern Recognition

Selecting a Match: Exploration vs Decision

71 - Ishan Agarwal , Richard Cole , Yixin Tao 2021

In a dynamic matching market, such as a marriage or job market, how should agents balance accepting a proposed match with the cost of continuing their search? We consider this problem in a discrete setting, in which agents have cardinal values and finite lifetimes, and proposed matches are random. We seek to quantify how well the agents can do. We provide upper and lower bounds on the collective losses of the agents, with a polynomially small failure probability, where the notion of loss is with respect to a plausible baseline we define. These bounds are tight up to constant factors. We highlight two aspects of this work. First, in our model, agents have a finite time in which to enjoy their matches, namely the minimum of their remaining lifetime and that of their partner; this implies that unmatched agents become less desirable over time, and suggests that their decision rules should change over time. Second, we use a discrete rather than a continuum model for the population. The discreteness causes variance which induces localized imbalances in the two sides of the market. One of the main technical challenges we face is to bound these imbalances. In addition, we present the results of simulations on moderate-sized problems for both the discrete and continu

Computer Science and Game Theory

DETR for Crowd Pedestrian Detection

100 - Matthieu Lin , Chuming Li , Xingyuan Bu 2020

Pedestrian detection in crowd scenes poses a challenging problem due to the heuristic defined mapping from anchors to pedestrians and the conflict between NMS and highly overlapped pedestrians. The recently proposed end-to-end detectors(ED), DETR and deformable DETR, replace hand designed components such as NMS and anchors using the transformer architecture, which gets rid of duplicate predictions by computing all pairwise interactions between queries. Inspired by these works, we explore their performance on crowd pedestrian detection. Surprisingly, compared to Faster-RCNN with FPN, the results are opposite to those obtained on COCO. Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes. In this work, we identify the underlying motives driving EDs poor performance and propose a new decoder to address them. Moreover, we design a mechanism to leverage the less occluded visible parts of pedestrian specifically for ED, and achieve further improvements. A faster bipartite match algorithm is also introduced to make ED training on crowd dataset more practical. The proposed detector PED(Pedestrian End-to-end Detector) outperforms both previous EDs and the baseline Faster-RCNN on CityPersons and CrowdHuman. It also achieves comparable performance with state-of-the-art pedestrian detection methods. Code will be released soon.

Computer Vision and Pattern Recognition

Variational Pedestrian Detection

144 - Yuang Zhang , Huanyu He , Jianguo Li 2021

Pedestrian detection in a crowd is a challenging task due to a high number of mutually-occluding human instances, which brings ambiguity and optimization difficulties to the current IoU-based ground truth assignment procedure in classical object detection methods. In this paper, we develop a unique perspective of pedestrian detection as a variational inference problem. We formulate a novel and efficient algorithm for pedestrian detection by modeling the dense proposals as a latent variable while proposing a customized Auto Encoding Variational Bayes (AEVB) algorithm. Through the optimization of our proposed algorithm, a classical detector can be fashioned into a variational pedestrian detector. Experiments conducted on CrowdHuman and CityPersons datasets show that the proposed algorithm serves as an efficient solution to handle the dense pedestrian detection problem for the case of single-stage detectors. Our method can also be flexibly applied to two-stage detectors, achieving notable performance enhancement.

Computer Vision and Pattern Recognition Machine Learning Image and Video Processing