Traffic Danger Recognition With Surveillance Cameras Without Training Data

124 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Lijun Yu

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lijun Yu - Dawei Zhang - Xiangqun Chen

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a traffic danger recognition model that works with arbitrary traffic surveillance cameras to identify and predict car crashes. There are too many cameras to monitor manually. Therefore, we developed a model to predict and identify car crashes from surveillance cameras based on a 3D reconstruction of the road plane and prediction of trajectories. For normal traffic, it supports real-time proactive safety checks of speeds and distances between vehicles to provide insights about possible high-risk areas. We achieve good prediction and recognition of car crashes without using any labeled training data of crashes. Experiments on the BrnoCompSpeed dataset show that our model can accurately monitor the road, with mean errors of 1.80% for distance measurement, 2.77 km/h for speed measurement, 0.24 m for car position prediction, and 2.53 km/h for speed prediction.

قيم البحث

66 - Lijun Yu , Peng Chen , Wenhe Liu 2020

We focus on the problem of detecting traffic events in a surveillance scenario, including the detection of both vehicle actions and traffic collisions. Existing event detection systems are mostly learning-based and have achieved convincing performanc e when a large amount of training data is available. However, in real-world scenarios, collecting sufficient labeled training data is expensive and sometimes impossible (e.g. for traffic collision detection). Moreover, the conventional 2D representation of surveillance views is easily affected by occlusions and different camera views in nature. To deal with the aforementioned problems, in this paper, we propose a training-free monocular 3D event detection system for traffic surveillance. Our system firstly projects the vehicles into the 3D Euclidean space and estimates their kinematic states. Then we develop multiple simple yet effective ways to identify the events based on the kinematic patterns, which need no further training. Consequently, our system is robust to the occlusions and the viewpoint changes. Exclusive experiments report the superior result of our method on large-scale real-world surveillance datasets, which validates the effectiveness of our proposed system.

الرؤية الحاسوبية وتمييز الأنماط

Surveilling Surveillance: Estimating the Prevalence of Surveillance Cameras with Street View Data

71 - Hao Sheng , Keniel Yao , Sharad Goel 2021

The use of video surveillance in public spaces -- both by government agencies and by private citizens -- has attracted considerable attention in recent years, particularly in light of rapid advances in face-recognition technology. But it has been dif ficult to systematically measure the prevalence and placement of cameras, hampering efforts to assess the implications of surveillance on privacy and public safety. Here, we combine computer vision, human verification, and statistical analysis to estimate the spatial distribution of surveillance cameras. Specifically, we build a camera detection model and apply it to 1.6 million street view images sampled from 10 large U.S. cities and 6 other major cities around the world, with positive model detections verified by human experts. After adjusting for the estimated recall of our model, and accounting for the spatial coverage of our sampled images, we are able to estimate the density of surveillance cameras visible from the road. Across the 16 cities we consider, the estimated number of surveillance cameras per linear kilometer ranges from 0.2 (in Los Angeles) to 0.9 (in Seoul). In a detailed analysis of the 10 U.S. cities, we find that cameras are concentrated in commercial, industrial, and mixed zones, and in neighborhoods with higher shares of non-white residents -- a pattern that persists even after adjusting for land use. These results help inform ongoing discussions on the use of surveillance technology, including its potential disparate impacts on communities of color.

أجهزة الكمبيوتر والمجتمع الرؤية الحاسوبية وتمييز الأنماط

PennSyn2Real: Training Object Recognition Models without Human Labeling

59 - Ty Nguyen , Ian D. Miller , Avi Cohen 2020

Scalable training data generation is a critical problem in deep learning. We propose PennSyn2Real - a photo-realistic synthetic dataset consisting of more than 100,000 4K images of more than 20 types of micro aerial vehicles (MAVs). The dataset can b e used to generate arbitrary numbers of training images for high-level computer vision tasks such as MAV detection and classification. Our data generation framework bootstraps chroma-keying, a mature cinematography technique with a motion tracking system, providing artifact-free and curated annotated images where object orientations and lighting are controlled. This framework is easy to set up and can be applied to a broad range of objects, reducing the gap between synthetic and real-world data. We show that synthetic data generated using this framework can be directly used to train CNN models for common object recognition tasks such as detection and segmentation. We demonstrate competitive performance in comparison with training using only real images. Furthermore, bootstrapping the generated synthetic data in few-shot learning can significantly improve the overall performance, reducing the number of required training data samples to achieve the desired accuracy.

الرؤية الحاسوبية وتمييز الأنماط

Deep Traffic Sign Detection and Recognition Without Target Domain Real Images

161 - Lucas Tabelini , Rodrigo Berriel , Thiago M. Paix~ao 2020

Deep learning has been successfully applied to several problems related to autonomous driving, often relying on large databases of real target-domain images for proper training. The acquisition of such real-world data is not always possible in the se lf-driving context, and sometimes their annotation is not feasible. Moreover, in many tasks, there is an intrinsic data imbalance that most learning-based methods struggle to cope with. Particularly, traffic sign detection is a challenging problem in which these three issues are seen altogether. To address these challenges, we propose a novel database generation method that requires only (i) arbitrary natural images, i.e., requires no real image from the target-domain, and (ii) templates of the traffic signs. The method does not aim at overcoming the training with real data, but to be a compatible alternative when the real data is not available. The effortlessly generated database is shown to be effective for the training of a deep detector on traffic signs from multiple countries. On large data sets, training with a fully synthetic data set almost matches the performance of training with a real one. When compared to training with a smaller data set of real images, training with synthetic images increased the accuracy by 12.25%. The proposed method also improves the performance of the detector when target-domain data are available.

الرؤية الحاسوبية وتمييز الأنماط

Real-World Super-Resolution of Face-Images from Surveillance Cameras

113 - Andreas Aakerberg , Kamal Nasrollahi , Thomas B. Moeslund 2021

Most existing face image Super-Resolution (SR) methods assume that the Low-Resolution (LR) images were artificially downsampled from High-Resolution (HR) images with bicubic interpolation. This operation changes the natural image characteristics and reduces noise. Hence, SR methods trained on such data most often fail to produce good results when applied to real LR images. To solve this problem, we propose a novel framework for generation of realistic LR/HR training pairs. Our framework estimates realistic blur kernels, noise distributions, and JPEG compression artifacts to generate LR images with similar image characteristics as the ones in the source domain. This allows us to train a SR model using high quality face images as Ground-Truth (GT). For better perceptual quality we use a Generative Adversarial Network (GAN) based SR model where we have exchanged the commonly used VGG-loss [24] with LPIPS-loss [52]. Experimental results on both real and artificially corrupted face images show that our method results in more detailed reconstructions with less noise compared to existing State-of-the-Art (SoTA) methods. In addition, we show that the traditional non-reference Image Quality Assessment (IQA) methods fail to capture this improvement and demonstrate that the more recent NIMA metric [16] correlates better with human perception via Mean Opinion Rank (MOR).

الرؤية الحاسوبية وتمييز الأنماط