Precise Detection in Densely Packed Scenes

59 0 0.0 ( 0 )

Download Cite

Added by Eran Goldman

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Eran Goldman - Roei Herzig - Aviv Eisenschtat

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Man-made scenes can be densely packed, containing numerous objects, often identical, positioned in close proximity. We show that precise object detection in such scenes remains a challenging frontier even for state-of-the-art object detectors. We propose a novel, deep-learning based method for precise object detection, designed for such challenging settings. Our contributions include: (1) A layer for estimating the Jaccard index as a detection quality score; (2) a novel EM merging unit, which uses our quality scores to resolve detection overlap ambiguities; finally, (3) an extensive, annotated data set, SKU-110K, representing packed retail environments, released for training and testing under such extreme settings. Detection tests on SKU-110K and counting tests on the CARPK and PUCPR+ show our method to outperform existing state-of-the-art with substantial margins. The code and data will be made available on url{www.github.com/eg4000/SKU110K_CVPR19}.

rate research

Dynamical properties of densely packed confined hard-sphere fluids

64 - Gerhard Jung , Michele Caraglio , Lukas Schrack 2020

Numerical solutions of the mode-coupling theory (MCT) equations for a hard-sphere fluid confined between two parallel hard walls are elaborated. The governing equations feature multiple parallel relaxation channels which significantly complicate their numerical integration. We investigate the intermediate scattering functions and the susceptibility spectra close to structural arrest and compare to an asymptotic analysis of the MCT equations. We corroborate that the data converge in the $beta$-scaling regime to two asymptotic power laws, viz. the critical decay and the von Schweidler law. The numerical results reveal a non-monotonic dependence of the power-law exponents on the slab width and a non-trivial kink in the low-frequency susceptibility spectra. We also find qualitative agreement of these theoretical results to event-driven molecular-dynamics simulations of polydisperse hard-sphere system. In particular, the non-trivial dependence of the dynamical properties on the slab width is well reproduced.

Soft Condensed Matter

Scalable gate architecture for densely packed semiconductor spin qubits

218 - D. M. Zajac , T. M. Hazard , X. Mi 2016

We demonstrate a 12 quantum dot device fabricated on an undoped Si/SiGe heterostructure as a proof-of-concept for a scalable, linear gate architecture for semiconductor quantum dots. The device consists of 9 quantum dots in a linear array and 3 single quantum dot charge sensors. We show reproducible single quantum dot charging and orbital energies, with standard deviations less than 20% relative to the mean across the 9 dot array. The single quantum dot charge sensors have a charge sensitivity of 8.2 x 10^{-4} e/root(Hz) and allow the investigation of real-time charge dynamics. As a demonstration of the versatility of this device, we use single-shot readout to measure a spin relaxation time T1 = 170 ms at a magnetic field B = 1 T. By reconfiguring the device, we form two capacitively coupled double quantum dots and extract a mutual charging energy of 200 microeV, which indicates that 50 GHz two-qubit gate operation speeds are feasible.

Mesoscale and Nanoscale Physics Quantum Physics

End-to-end people detection in crowded scenes

1025 - Russell Stewart , Mykhaylo Andriluka 2015

Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes.

Computer Vision and Pattern Recognition

Pixel-wise Anomaly Detection in Complex Driving Scenes

134 - Giancarlo Di Biase , Hermann Blum , Roland Siegwart 2021

The inability of state-of-the-art semantic segmentation methods to detect anomaly instances hinders them from being deployed in safety-critical and complex applications, such as autonomous driving. Recent approaches have focused on either leveraging segmentation uncertainty to identify anomalous areas or re-synthesizing the image from the semantic label map to find dissimilarities with the input image. In this work, we demonstrate that these two methodologies contain complementary information and can be combined to produce robust predictions for anomaly segmentation. We present a pixel-wise anomaly detection framework that uses uncertainty maps to improve over existing re-synthesis methods in finding dissimilarities between the input and generated images. Our approach works as a general framework around already trained segmentation networks, which ensures anomaly detection without compromising segmentation accuracy, while significantly outperforming all similar methods. Top-2 performance across a range of different anomaly datasets shows the robustness of our approach to handling different anomaly instances.

Computer Vision and Pattern Recognition

Spirit Distillation: Precise Real-time Semantic Segmentation of Road Scenes with Insufficient Data

168 - Zhiyuan Wu , Yu Jiang , Chupeng Cui 2021

Semantic segmentation of road scenes is one of the key technologies for realizing autonomous driving scene perception, and the effectiveness of deep Convolutional Neural Networks(CNNs) for this task has been demonstrated. State-of-art CNNs for semantic segmentation suffer from excessive computations as well as large-scale training data requirement. Inspired by the ideas of Fine-tuning-based Transfer Learning (FTT) and feature-based knowledge distillation, we propose a new knowledge distillation method for cross-domain knowledge transference and efficient data-insufficient network training, named Spirit Distillation(SD), which allow the student network to mimic the teacher network to extract general features, so that a compact and accurate student network can be trained for real-time semantic segmentation of road scenes. Then, in order to further alleviate the trouble of insufficient data and improve the robustness of the student, an Enhanced Spirit Distillation (ESD) method is proposed, which commits to exploit a more comprehensive general features extraction capability by considering images from both the target and the proximity domains as input. To our knowledge, this paper is a pioneering work on the application of knowledge distillation to few-shot learning. Persuasive experiments conducted on Cityscapes semantic segmentation with the prior knowledge transferred from COCO2017 and KITTI demonstrate that our methods can train a better student network (mIOU and high-precision accuracy boost by 1.4% and 8.2% respectively, with 78.2% segmentation variance) with only 41.8% FLOPs (see Fig. 1).

Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning