Evaluation of CNN-based Single-Image Depth Estimation Methods

95 0 0.0 ( 0 )

Download Cite

Added by Tobias Koch

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Tobias Koch - Lukas Liebel - Friedrich Fraundorfer

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

While an increasing interest in deep models for single-image depth estimation methods can be observed, established schemes for their evaluation are still limited. We propose a set of novel quality criteria, allowing for a more detailed analysis by focusing on specific characteristics of depth maps. In particular, we address the preservation of edges and planar regions, depth consistency, and absolute distance accuracy. In order to employ these metrics to evaluate and compare state-of-the-art single-image depth estimation approaches, we provide a new high-quality RGB-D dataset. We used a DSLR camera together with a laser scanner to acquire high-resolution images and highly accurate depth maps. Experimental results show the validity of our proposed evaluation protocol.

rate research

Depth-based hand pose estimation: methods, data, and challenges

472 - James Steven Supancic III , Gregory Rogez , Yi Yang 2015

Hand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new advances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable number of systems, and will release all software and evaluation code. We summarize important conclusions here: (1) Pose estimation appears roughly solved for scenes with isolated hands. However, methods still struggle to analyze cluttered scenes where hands may be interacting with nearby objects and surfaces. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate criteria, making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experiments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that training data is as important as the model itself. We conclude with directions for future progress.

Computer Vision and Pattern Recognition

Subjective evaluation of traditional and learning-based image coding methods

82 - Zhigao Fang , Jiaqi Zhang , Lu Yu 2021

We conduct a subjective experiment to compare the performance of traditional image coding methods and learning-based image coding methods. HEVC and VVC, the state-of-the-art traditional coding methods, are used as the representative traditional methods. The learning-based methods used contain not only CNN-based methods, but also a GAN-based method, all of which are advanced or typical. Single Stimuli (SS), which is also called Absolute Category Rating (ACR), is adopted as the methodology of the experiment to obtain perceptual quality of images. Additionally, we utilize some typical and frequently used objective quality metrics to evaluate the coding methods in the experiment as comparison. The experiment shows that CNN-based and GAN-based methods can perform better than traditional methods in low bit-rates. In high bit-rates, however, it is hard to verify whether CNN-based methods are superior to traditional methods. Because the GAN method does not provide models with high target bit-rates, we cannot exactly tell the performance of the GAN method in high bit-rates. Furthermore, some popular objective quality metrics have not shown the ability well to measure quality of images generated by learning-based coding methods, especially the GAN-based one.

Computer Vision and Pattern Recognition Image and Video Processing

A Neural Network for Detailed Human Depth Estimation from a Single Image

125 - Sicong Tang , Feitong Tan , Kelvin Cheng 2019

This paper presents a neural network to estimate a detailed depth map of the foreground human in a single RGB image. The result captures geometry details such as cloth wrinkles, which are important in visualization applications. To achieve this goal, we separate the depth map into a smooth base shape and a residual detail shape and design a network with two branches to regress them respectively. We design a training strategy to ensure both base and detail shapes can be faithfully learned by the corresponding network branches. Furthermore, we introduce a novel network layer to fuse a rough depth map and surface normals to further improve the final result. Quantitative comparison with fused `ground truth captured by real depth cameras and qualitative examples on unconstrained Internet images demonstrate the strength of the proposed method. The code is available at https://github.com/sfu-gruvi-3dv/deep_human.

Computer Vision and Pattern Recognition

CNN-based Single Image Crowd Counting: Network Design, Loss Function and Supervisory Signal

386 - Haoyue Bai , S.-H. Gary Chan 2020

Single image crowd counting is a challenging computer vision problem with wide applications in public safety, city planning, traffic management, etc. This survey is to provide a comprehensive summary of recent advanced crowd counting techniques based on Convolutional Neural Network (CNN) via density map estimation. Our goals are to provide an up-to-date review of recent approaches, and educate new researchers in this field the design principles and trade-offs. After presenting publicly available datasets and evaluation metrics, we review the recent advances with detailed comparisons on three major design modules for crowd counting: deep neural network designs, loss functions, and supervisory signals. We conclude the survey with some future directions.

Computer Vision and Pattern Recognition

Progressive Depth Learning for Single Image Dehazing

242 - Yudong Liang , Bin Wang , Jiaying Liu 2021

The formulation of the hazy image is mainly dominated by the reflected lights and ambient airlight. Existing dehazing methods often ignore the depth cues and fail in distant areas where heavier haze disturbs the visibility. However, we note that the guidance of the depth information for transmission estimation could remedy the decreased visibility as distances increase. In turn, the good transmission estimation could facilitate the depth estimation for hazy images. In this paper, a deep end-to-end model that iteratively estimates image depths and transmission maps is proposed to perform an effective depth prediction for hazy images and improve the dehazing performance with the guidance of depth information. The image depth and transmission map are progressively refined to better restore the dehazed image. Our approach benefits from explicitly modeling the inner relationship of image depth and transmission map, which is especially effective for distant hazy areas. Extensive results on the benchmarks demonstrate that our proposed network performs favorably against the state-of-the-art dehazing methods in terms of depth estimation and haze removal.

Computer Vision and Pattern Recognition