An underwater binocular stereo matching algorithm based on the best search domain

91 0 0.0 ( 0 )

Download Cite

Added by Yimin Peng

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yimin Peng - Yunlong Li - Zijing Fang

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Binocular stereo vision is an important branch of machine vision, which imitates the human eye and matches the left and right images captured by the camera based on epipolar constraints. The matched disparity map can be calculated according to the camera imaging model to obtain a depth map, and then the depth map is converted to a point cloud image to obtain spatial point coordinates, thereby achieving the purpose of ranging. However, due to the influence of illumination under water, the captured images no longer meet the epipolar constraints, and the changes in imaging models make traditional calibration methods no longer applicable. Therefore, this paper proposes a new underwater real-time calibration method and a matching method based on the best search domain to improve the accuracy of underwater distance measurement using binoculars.

rate research

The Next Best Underwater View

53 - Mark Sheinin , Yoav Y. Schechner 2015

To image in high resolution large and occlusion-prone scenes, a camera must move above and around. Degradation of visibility due to geometric occlusions and distances is exacerbated by scattering, when the scene is in a participating medium. Moreover, underwater and in other media, artificial lighting is needed. Overall, data quality depends on the observed surface, medium and the time-varying poses of the camera and light source. This work proposes to optimize camera/light poses as they move, so that the surface is scanned efficiently and the descattered recovery has the highest quality. The work generalizes the next best view concept of robot vision to scattering media and cooperative movable lighting. It also extends descattering to platforms that move optimally. The optimization criterion is information gain, taken from information theory. We exploit the existence of a prior rough 3D model, since underwater such a model is routinely obtained using sonar. We demonstrate this principle in a scaled-down setup.

Computer Vision and Pattern Recognition

A Matching Algorithm based on Image Attribute Transfer and Local Features for Underwater Acoustic and Optical Images

60 - Xiaoteng Zhou , Changli Yu , Xin Yuan 2021

In the field of underwater vision research, image matching between the sonar sensors and optical cameras has always been a challenging problem. Due to the difference in the imaging mechanism between them, which are the gray value, texture, contrast, etc. of the acoustic images and the optical images are also variant in local locations, which makes the traditional matching method based on the optical image invalid. Coupled with the difficulties and high costs of underwater data acquisition, it further affects the research process of acousto-optic data fusion technology. In order to maximize the use of underwater sensor data and promote the development of multi-sensor information fusion (MSIF), this study applies the image attribute transfer method based on deep learning approach to solve the problem of acousto-optic image matching, the core of which is to eliminate the imaging differences between them as much as possible. At the same time, the advanced local feature descriptor is introduced to solve the challenging acousto-optic matching problem. Experimental results show that our proposed method could preprocess acousto-optic images effectively and obtain accurate matching results. Additionally, the method is based on the combination of image depth semantic layer, and it could indirectly display the local feature matching relationship between original image pair, which provides a new solution to the underwater multi-sensor image matching problem.

Computer Vision and Pattern Recognition

Level Set Binocular Stereo with Occlusions

70 - Jialiang Wang , Todd Zickler 2021

Localizing stereo boundaries and predicting nearby disparities are difficult because stereo boundaries induce occluded regions where matching cues are absent. Most modern computer vision algorithms treat occlusions secondarily (e.g., via left-right consistency checks after matching) or rely on high-level cues to improve nearby disparities (e.g., via deep networks and large training sets). They ignore the geometry of stereo occlusions, which dictates that the spatial extent of occlusion must equal the amplitude of the disparity jump that causes it. This paper introduces an energy and level-set optimizer that improves boundaries by encoding occlusion geometry. Our model applies to two-layer, figure-ground scenes, and it can be implemented cooperatively using messages that pass predominantly between parents and children in an undecimated hierarchy of multi-scale image patches. In a small collection of figure-ground scenes curated from Middlebury and Falling Things stereo datasets, our model provides more accurate boundaries than previous occlusion-handling stereo techniques. This suggests new directions for creating cooperative stereo systems that incorporate occlusion cues in a human-like manner.

Computer Vision and Pattern Recognition Image and Video Processing

Hierarchical Neural Architecture Search for Deep Stereo Matching

181 - Xuelian Cheng , Yiran Zhong , Mehrtash Harandi 2020

To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation. The underlying idea for the NAS algorithm is straightforward, namely, to enable the network the ability to choose among a set of operations (e.g., convolution with different filter sizes), one is able to find an optimal architecture that is better adapted to the problem at hand. However, so far the success of NAS has not been enjoyed by low-level geometric vision tasks such as stereo matching. This is partly due to the fact that state-of-the-art deep stereo matching networks, designed by humans, are already sheer in size. Directly applying the NAS to such massive structures is computationally prohibitive based on the currently available mainstream computing resources. In this paper, we propose the first end-to-end hierarchical NAS framework for deep stereo matching by incorporating task-specific human knowledge into the neural architecture search framework. Specifically, following the gold standard pipeline for deep stereo matching (i.e., feature extraction -- feature volume construction and dense matching), we optimize the architectures of the entire pipeline jointly. Extensive experiments show that our searched network outperforms all state-of-the-art deep stereo matching architectures and is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset with a substantial improvement on the size of the network and the speed of inference. The code is available at https://github.com/XuelianCheng/LEAStereo.

Computer Vision and Pattern Recognition

FPGA-based Binocular Image Feature Extraction and Matching System

71 - Qi Ni , Fei Wang , Ziwei Zhao 2019

Image feature extraction and matching is a fundamental but computation intensive task in machine vision. This paper proposes a novel FPGA-based embedded system to accelerate feature extraction and matching. It implements SURF feature point detection and BRIEF feature descriptor construction and matching. For binocular stereo vision, feature matching includes both tracking matching and stereo matching, which simultaneously provide feature point correspondences and parallax information. Our system is evaluated on a ZYNQ XC7Z045 FPGA. The result demonstrates that it can process binocular video data at a high frame rate (640$times$480 @ 162fps). Moreover, an extensive test proves our system has robustness for image compression, blurring and illumination.

Computer Vision and Pattern Recognition Hardware Architecture Multimedia