Fully Parallel Architecture for Semi-global Stereo Matching with Refined Rank Method

134 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yiwu Yao

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yiwu Yao - Yuhua Cheng

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Fully parallel architecture at disparity-level for efficient semi-global matching (SGM) with refined rank method is presented. The improved SGM algorithm is implemented with the non-parametric unified rank model which is the combination of Rank filter/AD and Rank SAD. Rank SAD is a novel definition by introducing the constraints of local image structure into the rank method. As a result, the unified rank model with Rank SAD can make up for the defects of Rank filter/AD. Experimental results show both excellent subjective quality and objective performance of the refined SGM algorithm. The fully parallel construction for hardware implementation of SGM is architected with reasonable strategies at disparity-level. The parallelism of the data-stream allows proper throughput for specific applications with acceptable maximum frequency. The results of RTL emulation and synthesis ensure that the proposed parallel architecture is suitable for VLSI implementation.

قيم البحث

181 - Xuelian Cheng , Yiran Zhong , Mehrtash Harandi 2020

To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation. The underlying idea for the NAS algor ithm is straightforward, namely, to enable the network the ability to choose among a set of operations (e.g., convolution with different filter sizes), one is able to find an optimal architecture that is better adapted to the problem at hand. However, so far the success of NAS has not been enjoyed by low-level geometric vision tasks such as stereo matching. This is partly due to the fact that state-of-the-art deep stereo matching networks, designed by humans, are already sheer in size. Directly applying the NAS to such massive structures is computationally prohibitive based on the currently available mainstream computing resources. In this paper, we propose the first end-to-end hierarchical NAS framework for deep stereo matching by incorporating task-specific human knowledge into the neural architecture search framework. Specifically, following the gold standard pipeline for deep stereo matching (i.e., feature extraction -- feature volume construction and dense matching), we optimize the architectures of the entire pipeline jointly. Extensive experiments show that our searched network outperforms all state-of-the-art deep stereo matching architectures and is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset with a substantial improvement on the size of the network and the speed of inference. The code is available at https://github.com/XuelianCheng/LEAStereo.

الرؤية الحاسوبية وتمييز الأنماط

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

117 - Lahav Lipson , Zachary Teed , Jia Deng 2021

We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT. We introduce multi-level convolutional GRUs, which more efficiently propagate information across the image. A modified version of RAFT-Ster eo can perform accurate real-time inference. RAFT-stereo ranks first on the Middlebury leaderboard, outperforming the next best method on 1px error by 29% and outperforms all published work on the ETH3D two-view stereo benchmark. Code is available at https://github.com/princeton-vl/RAFT-Stereo.

الرؤية الحاسوبية وتمييز الأنماط

Expanding Sparse Guidance for Stereo Matching

177 - Yu-Kai Huang , Yueh-Cheng Liu , Tsung-Han Wu 2020

The performance of image based stereo estimation suffers from lighting variations, repetitive patterns and homogeneous appearance. Moreover, to achieve good performance, stereo supervision requires sufficient densely-labeled data, which are hard to o btain. In this work, we leverage small amount of data with very sparse but accurate disparity cues from LiDAR to bridge the gap. We propose a novel sparsity expansion technique to expand the sparse cues concerning RGB images for local feature enhancement. The feature enhancement method can be easily applied to any stereo estimation algorithms with cost volume at the test stage. Extensive experiments on stereo datasets demonstrate the effectiveness and robustness across different backbones on domain adaption and self-supervision scenario. Our sparsity expansion method outperforms previous methods in terms of disparity by more than 2 pixel error on KITTI Stereo 2012 and 3 pixel error on KITTI Stereo 2015. Our approach significantly boosts the existing state-of-the-art stereo algorithms with extremely sparse cues.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

A Decomposition Model for Stereo Matching

117 - Chengtang Yao , Yunde Jia , Huijun Di 2021

In this paper, we present a decomposition model for stereo matching to solve the problem of excessive growth in computational cost (time and memory cost) as the resolution increases. In order to reduce the huge cost of stereo matching at the original resolution, our model only runs dense matching at a very low resolution and uses sparse matching at different higher resolutions to recover the disparity of lost details scale-by-scale. After the decomposition of stereo matching, our model iteratively fuses the sparse and dense disparity maps from adjacent scales with an occlusion-aware mask. A refinement network is also applied to improving the fusion result. Compared with high-performance methods like PSMNet and GANet, our method achieves $10-100times$ speed increase while obtaining comparable disparity estimation results.

الرؤية الحاسوبية وتمييز الأنماط

Single View Stereo Matching

145 - Yue Luo , Jimmy Ren , Mude Lin 2018

Previous monocular depth estimation methods take a single view and directly regress the expected results. Though recent advances are made by applying geometrically inspired loss functions during training, the inference procedure does not explicitly i mpose any geometrical constraint. Therefore these models purely rely on the quality of data and the effectiveness of learning to generalize. This either leads to suboptimal results or the demand of huge amount of expensive ground truth labelled data to generate reasonable results. In this paper, we show for the first time that the monocular depth estimation problem can be reformulated as two sub-problems, a view synthesis procedure followed by stereo matching, with two intriguing properties, namely i) geometrical constraints can be explicitly imposed during inference; ii) demand on labelled depth data can be greatly alleviated. We show that the whole pipeline can still be trained in an end-to-end fashion and this new formulation plays a critical role in advancing the performance. The resulting model outperforms all the previous monocular depth estimation methods as well as the stereo block matching method in the challenging KITTI dataset by only using a small number of real training data. The model also generalizes well to other monocular depth estimation benchmarks. We also discuss the implications and the advantages of solving monocular depth estimation using stereo methods.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات (0)

لا يوجد تعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة العربية الخاصة للعلوم والتكنولوجيا

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Fully Parallel Architecture for Semi-global Stereo Matching with Refined Rank Method

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً