Self-Supervised Deep Depth Denoising

93 0 0.0 ( 0 )

Download Cite

Added by Spyridon Thermos

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Vladimiros Sterzentsenko - Leonidas Saroglou - Anargyros Chatzitofis

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Depth perception is considered an invaluable source of information for various vision tasks. However, depth maps acquired using consumer-level sensors still suffer from non-negligible noise. This fact has recently motivated researchers to exploit traditional filters, as well as the deep learning paradigm, in order to suppress the aforementioned non-uniform noise, while preserving geometric details. Despite the effort, deep depth denoising is still an open challenge mainly due to the lack of clean data that could be used as ground truth. In this paper, we propose a fully convolutional deep autoencoder that learns to denoise depth maps, surpassing the lack of ground truth data. Specifically, the proposed autoencoder exploits multiple views of the same scene from different points of view in order to learn to suppress noise in a self-supervised end-to-end manner using depth and color information during training, yet only depth during inference. To enforce selfsupervision, we leverage a differentiable rendering technique to exploit photometric supervision, which is further regularized using geometric and surface priors. As the proposed approach relies on raw data acquisition, a large RGB-D corpus is collected using Intel RealSense sensors. Complementary to a quantitative evaluation, we demonstrate the effectiveness of the proposed self-supervised denoising approach on established 3D reconstruction applications. Code is avalable at https://github.com/VCL3D/DeepDepthDenoising

rate research

Self-supervised Depth Denoising Using Lower- and Higher-quality RGB-D sensors

237 - Akhmedkhan Shabanov , Ilya Krotov , Nikolay Chinaev 2020

Consumer-level depth cameras and depth sensors embedded in mobile devices enable numerous applications, such as AR games and face identification. However, the quality of the captured depth is sometimes insufficient for 3D reconstruction, tracking and other computer vision tasks. In this paper, we propose a self-supervised depth denoising approach to denoise and refine depth coming from a low quality sensor. We record simultaneous RGB-D sequences with unzynchronized lower- and higher-quality cameras and solve a challenging problem of aligning sequences both temporally and spatially. We then learn a deep neural network to denoise the lower-quality depth using the matched higher-quality data as a source of supervision signal. We experimentally validate our method against state-of-the-art filtering-based and deep denoising techniques and show its application for 3D object reconstruction tasks where our approach leads to more detailed fused surfaces and better tracking.

Computer Vision and Pattern Recognition

SelfVIO: Self-Supervised Deep Monocular Visual-Inertial Odometry and Depth Estimation

112 - Yasin Almalioglu , Mehmet Turan , Alp Eren Sari 2019

In the last decade, numerous supervised deep learning approaches requiring large amounts of labeled data have been proposed for visual-inertial odometry (VIO) and depth map estimation. To overcome the data limitation, self-supervised learning has emerged as a promising alternative, exploiting constraints such as geometric and photometric consistency in the scene. In this study, we introduce a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual-inertial sensor fusion. SelfVIO learns to jointly estimate 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabeled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without the need for IMU intrinsic parameters and/or the extrinsic calibration between the IMU and the camera. estimation and single-view depth recovery network. We provide comprehensive quantitative and qualitative evaluations of the proposed framework comparing its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.

Computer Vision and Pattern Recognition Machine Learning Image and Video Processing

Noise2Inverse: Self-supervised deep convolutional denoising for tomography

96 - Allard A. Hendriksen , Daniel M. Pelt , K. Joost Batenburg 2020

Recovering a high-quality image from noisy indirect measurements is an important problem with many applications. For such inverse problems, supervised deep convolutional neural network (CNN)-based denoising methods have shown strong results, but the success of these supervised methods critically depends on the availability of a high-quality training dataset of similar measurements. For image denoising, methods are available that enable training without a separate training dataset by assuming that the noise in two different pixels is uncorrelated. However, this assumption does not hold for inverse problems, resulting in artifacts in the denoised images produced by existing methods. Here, we propose Noise2Inverse, a deep CNN-based denoising method for linear image reconstruction algorithms that does not require any additional clean or noisy data. Training a CNN-based denoiser is enabled by exploiting the noise model to compute multiple statistically independent reconstructions. We develop a theoretical framework which shows that such training indeed obtains a denoising CNN, assuming the measured noise is element-wise independent and zero-mean. On simulated CT datasets, Noise2Inverse demonstrates an improvement in peak signal-to-noise ratio and structural similarity index compared to state-of-the-art image denoising methods and conventional reconstruction methods, such as Total-Variation Minimization. We also demonstrate that the method is able to significantly reduce noise in challenging real-world experimental datasets.

Image and Video Processing Computer Vision and Pattern Recognition Machine Learning

Calibrating Self-supervised Monocular Depth Estimation

139 - Robert McCraith , Lukas Neumann , Andrea Vedaldi 2020

In the recent years, many methods demonstrated the ability of neural networks tolearn depth and pose changes in a sequence of images, using only self-supervision as thetraining signal. Whilst the networks achieve good performance, the often over-lookeddetail is that due to the inherent ambiguity of monocular vision they predict depth up to aunknown scaling factor. The scaling factor is then typically obtained from the LiDARground truth at test time, which severely limits practical applications of these methods.In this paper, we show that incorporating prior information about the camera configu-ration and the environment, we can remove the scale ambiguity and predict depth directly,still using the self-supervised formulation and not relying on any additional sensors.

Computer Vision and Pattern Recognition

Self-Supervised Human Depth Estimation from Monocular Videos

367 - Feitong Tan , Hao Zhu , Zhaopeng Cui 2020

Previous methods on estimating detailed human depth often require supervised training with `ground truth depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data collection simple and improves the generalization of the learned network. The self-supervised learning is achieved by minimizing a photo-consistency loss, which is evaluated between a video frame and its neighboring frames warped according to the estimated depth and the 3D non-rigid motion of the human body. To solve this non-rigid motion, we first estimate a rough SMPL model at each video frame and compute the non-rigid body motion accordingly, which enables self-supervised learning on estimating the shape details. Experiments demonstrate that our method enjoys better generalization and performs much better on data in the wild.

Computer Vision and Pattern Recognition