Visual Depth Mapping from Monocular Images using Recurrent Convolutional Neural Networks

346 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل John Mern

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف John Mern - Kyle Julian - Rachael E. Tompa

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A reliable sense-and-avoid system is critical to enabling safe autonomous operation of unmanned aircraft. Existing sense-and-avoid methods often require specialized sensors that are too large or power intensive for use on small unmanned vehicles. This paper presents a method to estimate object distances based on visual image sequences, allowing for the use of low-cost, on-board monocular cameras as simple collision avoidance sensors. We present a deep recurrent convolutional neural network and training method to generate depth maps from video sequences. Our network is trained using simulated camera and depth data generated with Microsofts AirSim simulator. Empirically, we show that our model achieves superior performance compared to models generated using prior methods.We further demonstrate that the method can be used for sense-and-avoid of obstacles in simulation.

قيم البحث

65 - S. Bazrafkan 2017

Deep neural networks are applied to a wide range of problems in recent years. In this work, Convolutional Neural Network (CNN) is applied to the problem of determining the depth from a single camera image (monocular depth). Eight different networks a re designed to perform depth estimation, each of them suitable for a feature level. Networks with different pooling sizes determine different feature levels. After designing a set of networks, these models may be combined into a single network topology using graph optimization techniques. This Semi Parallel Deep Neural Network (SPDNN) eliminates duplicated common network layers, and can be further optimized by retraining to achieve an improved model compared to the individual topologies. In this study, four SPDNN models are trained and have been evaluated at 2 stages on the KITTI dataset. The ground truth images in the first part of the experiment are provided by the benchmark, and for the second part, the ground truth images are the depth map results from applying a state-of-the-art stereo matching method. The results of this evaluation demonstrate that using post-processing techniques to refine the target of the network increases the accuracy of depth estimation on individual mono images. The second evaluation shows that using segmentation data alongside the original data as the input can improve the depth estimation results to a point where performance is comparable with stereo depth estimation. The computational time is also discussed in this study.

الرؤية الحاسوبية وتمييز الأنماط

Band Selection from Hyperspectral Images Using Attention-based Convolutional Neural Networks

138 - Pablo Ribalta Lorenzo , Lukasz Tulczyjew , Michal Marcinkiewicz 2018

This paper introduces new attention-based convolutional neural networks for selecting bands from hyperspectral images. The proposed approach re-uses convolutional activations at different depths, identifying the most informative regions of the spectr um with the help of gating mechanisms. Our attention techniques are modular and easy to implement, and they can be seamlessly trained end-to-end using gradient descent. Our rigorous experiments showed that deep models equipped with the attention mechanism deliver high-quality classification, and repeatedly identify significant bands in the training data, permitting the creation of refined and extremely compact sets that retain the most meaningful features.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

Visualizing Deep Convolutional Neural Networks Using Natural Pre-Images

152 - Aravindh Mahendran , Andrea Vedaldi 2015

Image representations, from SIFT and bag of visual words to Convolutional Neural Networks (CNNs) are a crucial component of almost all computer vision systems. However, our understanding of them remains limited. In this paper we study several landmar k representations, both shallow and deep, by a number of complementary visualization techniques. These visualizations are based on the concept of natural pre-image, namely a natural-looking image whose representation has some notable property. We study in particular three such visualizations: inversion, in which the aim is to reconstruct an image from its representation, activation maximization, in which we search for patterns that maximally stimulate a representation component, and caricaturization, in which the visual patterns that a representation detects in an image are exaggerated. We pose these as a regularized energy-minimization framework and demonstrate its generality and effectiveness. In particular, we show that this method can invert representations such as HOG more accurately than recent alternatives while being applicable to CNNs too. Among our findings, we show that several layers in CNNs retain photographically accurate information about the image, with different degrees of geometric and photometric invariance.

الرؤية الحاسوبية وتمييز الأنماط

Anomaly Recognition from surveillance videos using 3D Convolutional Neural Networks

238 - R. Maqsood , UI. Bajwa , G. Saleem 2021

Anomalous activity recognition deals with identifying the patterns and events that vary from the normal stream. In a surveillance paradigm, these events range from abuse to fighting and road accidents to snatching, etc. Due to the sparse occurrence o f anomalous events, anomalous activity recognition from surveillance videos is a challenging research task. The approaches reported can be generally categorized as handcrafted and deep learning-based. Most of the reported studies address binary classification i.e. anomaly detection from surveillance videos. But these reported approaches did not address other anomalous events e.g. abuse, fight, road accidents, shooting, stealing, vandalism, and robbery, etc. from surveillance videos. Therefore, this paper aims to provide an effective framework for the recognition of different real-world anomalies from videos. This study provides a simple, yet effective approach for learning spatiotemporal features using deep 3-dimensional convolutional networks (3D ConvNets) trained on the University of Central Florida (UCF) Crime video dataset. Firstly, the frame-level labels of the UCF Crime dataset are provided, and then to extract anomalous spatiotemporal features more efficiently a fine-tuned 3D ConvNets is proposed. Findings of the proposed study are twofold 1)There exist specific, detectable, and quantifiable features in UCF Crime video feed that associate with each other 2) Multiclass learning can improve generalizing competencies of the 3D ConvNets by effectively learning frame-level information of dataset and can be leveraged in terms of better results by applying spatial augmentation.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Segmentation of the Proximal Femur from MR Images using Deep Convolutional Neural Networks

268 - Cem M. Deniz , Siyuan Xiang , Spencer Hallyburton 2017

Magnetic resonance imaging (MRI) has been proposed as a complimentary method to measure bone quality and assess fracture risk. However, manual segmentation of MR images of bone is time-consuming, limiting the use of MRI measurements in the clinical p ractice. The purpose of this paper is to present an automatic proximal femur segmentation method that is based on deep convolutional neural networks (CNNs). This study had institutional review board approval and written informed consent was obtained from all subjects. A dataset of volumetric structural MR images of the proximal femur from 86 subject were manually-segmented by an expert. We performed experiments by training two different CNN architectures with multiple number of initial feature maps and layers, and tested their segmentation performance against the gold standard of manual segmentations using four-fold cross-validation. Automatic segmentation of the proximal femur achieved a high dice similarity score of 0.94$pm$0.05 with precision = 0.95$pm$0.02, and recall = 0.94$pm$0.08 using a CNN architecture based on 3D convolution exceeding the performance of 2D CNNs. The high segmentation accuracy provided by CNNs has the potential to help bring the use of structural MRI measurements of bone quality into clinical practice for management of osteoporosis.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي