Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Fast and Accurate: Video Enhancement using Sparse Depth

85 0 0.0 ( 0 )

Download Cite

Added by Yu Feng

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yu Feng - Patrick Hansen - Paul N. Whatmough

Computer Vision and Pattern Recognition Robotics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper presents a general framework to build fast and accurate algorithms for video enhancement tasks such as super-resolution, deblurring, and denoising. Essential to our framework is the realization that the accuracy, rather than the density, of pixel flows is what is required for high-quality video enhancement. Most of prior works take the opposite approach: they estimate dense (per-pixel)-but generally less robust-flows, mostly using computationally costly algorithms. Instead, we propose a lightweight flow estimation algorithm; it fuses the sparse point cloud data and (even sparser and less reliable) IMU data available in modern autonomous agents to estimate the flow information. Building on top of the flow estimation, we demonstrate a general framework that integrates the flows in a plug-and-play fashion with different task-specific layers. Algorithms built in our framework achieve 1.78x - 187.41x speedup while providing a 0.42 dB - 6.70 dB quality improvement over competing methods.

rate research

Unsupervised Depth and Ego-motion Estimation for Monocular Thermal Video using Multi-spectral Consistency Loss

97 - Ukcheol Shin , Kyunghyun Lee , Seokju Lee 2021

Most of the deep-learning based depth and ego-motion networks have been designed for visible cameras. However, visible cameras heavily rely on the presence of an external light source. Therefore, it is challenging to use them under low-light conditions such as night scenes, tunnels, and other harsh conditions. A thermal camera is one solution to compensate for this problem because it detects Long Wave Infrared Radiation(LWIR) regardless of any external light sources. However, despite this advantage, both depth and ego-motion estimation research for the thermal camera are not actively explored until so far. In this paper, we propose an unsupervised learning method for the all-day depth and ego-motion estimation. The proposed method exploits multi-spectral consistency loss to gives complementary supervision for the networks by reconstructing visible and thermal images with the depth and pose estimated from thermal images. The networks trained with the proposed method robustly estimate the depth and pose from monocular thermal video under low-light and even zero-light conditions. To the best of our knowledge, this is the first work to simultaneously estimate both depth and ego-motion from the monocular thermal video in an unsupervised manner.

Computer Vision and Pattern Recognition Robotics

Self-Guided Instance-Aware Network for Depth Completion and Enhancement

85 - Zhongzhen Luo , Fengjia Zhang , Guoyi Fu 2021

Depth completion aims at inferring a dense depth image from sparse depth measurement since glossy, transparent or distant surface cannot be scanned properly by the sensor. Most of existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values. Consequently, this leads to blurred boundaries or inaccurate structure of object. To address these problems, we propose a novel self-guided instance-aware network (SG-IANet) that: (1) utilize self-guided mechanism to extract instance-level features that is needed for depth restoration, (2) exploit the geometric and context information into network learning to conform to the underlying constraints for edge clarity and structure consistency, (3) regularize the depth estimation and mitigate the impact of noise by instance-aware learning, and (4) train with synthetic data only by domain randomization to bridge the reality gap. Extensive experiments on synthetic and real world dataset demonstrate that our proposed method outperforms previous works. Further ablation studies give more insights into the proposed method and demonstrate the generalization capability of our model.

Computer Vision and Pattern Recognition Robotics

Bootstrapped Self-Supervised Training with Monocular Video for Semantic Segmentation and Depth Estimation

148 - Yihao Zhang , John J. Leonard 2021

For a robot deployed in the world, it is desirable to have the ability of autonomous learning to improve its initial pre-set knowledge. We formalize this as a bootstrapped self-supervised learning problem where a system is initially bootstrapped with supervised training on a labeled dataset and we look for a self-supervised training method that can subsequently improve the system over the supervised training baseline using only unlabeled data. In this work, we leverage temporal consistency between frames in monocular video to perform this bootstrapped self-supervised training. We show that a well-trained state-of-the-art semantic segmentation network can be further improved through our method. In addition, we show that the bootstrapped self-supervised training framework can help a network learn depth estimation better than pure supervised training or self-supervised training.

Computer Vision and Pattern Recognition Robotics

Fast and Accurate Online Video Object Segmentation via Tracking Parts

150 - Jingchun Cheng , Yi-Hsuan Tsai , Wei-Chih Hung 2018

Online video object segmentation is a challenging task as it entails to process the image sequence timely and accurately. To segment a target object through the video, numerous CNN-based methods have been developed by heavily finetuning on the object mask in the first frame, which is time-consuming for online applications. In this paper, we propose a fast and accurate video object segmentation algorithm that can immediately start the segmentation process once receiving the images. We first utilize a part-based tracking method to deal with challenging factors such as large deformation, occlusion, and cluttered background. Based on the tracked bounding boxes of parts, we construct a region-of-interest segmentation network to generate part masks. Finally, a similarity-based scoring function is adopted to refine these object parts by comparing them to the visual information in the first frame. Our method performs favorably against state-of-the-art algorithms in accuracy on the DAVIS benchmark dataset, while achieving much faster runtime performance.

Computer Vision and Pattern Recognition

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

76 - Lingzhi He , Hongguang Zhu , Feng Li 2021

Depth maps obtained by commercial depth sensors are always in low-resolution, making it difficult to be used in various computer vision tasks. Thus, depth map super-resolution (SR) is a practical and valuable task, which upscales the depth map into high-resolution (HR) space. However, limited by the lack of real-world paired low-resolution (LR) and HR depth maps, most existing methods use downsampling to obtain paired training samples. To this end, we first construct a large-scale dataset named RGB-D-D, which can greatly promote the study of depth map SR and even more depth-related real-world tasks. The D-D in our dataset represents the paired LR and HR depth maps captured from mobile phone and Lucid Helios respectively ranging from indoor scenes to challenging outdoor scenes. Besides, we provide a fast depth map super-resolution (FDSR) baseline, in which the high-frequency component adaptively decomposed from RGB image to guide the depth map SR. Extensive experiments on existing public datasets demonstrate the effectiveness and efficiency of our network compared with the state-of-the-art methods. Moreover, for the real-world LR depth maps, our algorithm can produce more accurate HR depth maps with clearer boundaries and to some extent correct the depth value errors.

Computer Vision and Pattern Recognition Artificial Intelligence

comments

Fetching comments

Helwan

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Fast and Accurate: Video Enhancement using Sparse Depth

Ask ChatGPT about the research

No Arabic abstract

Read More