ترغب بنشر مسار تعليمي؟ اضغط هنا

Co-Teaching: An Ark to Unsupervised Stereo Matching

114   0   0.0 ( 0 )
 نشر من قبل Hengli Wang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Stereo matching is a key component of autonomous driving perception. Recent unsupervised stereo matching approaches have received adequate attention due to their advantage of not requiring disparity ground truth. These approaches, however, perform poorly near occlusions. To overcome this drawback, in this paper, we propose CoT-Stereo, a novel unsupervised stereo matching approach. Specifically, we adopt a co-teaching framework where two networks interactively teach each other about the occlusions in an unsupervised fashion, which greatly improves the robustness of unsupervised stereo matching. Extensive experiments on the KITTI Stereo benchmarks demonstrate the superior performance of CoT-Stereo over all other state-of-the-art unsupervised stereo matching approaches in terms of both accuracy and speed. Our project webpage is https://sites.google.com/view/cot-stereo.



قيم البحث

اقرأ أيضاً

75 - Hengli Wang , Rui Fan , Ming Liu 2020
The interpretation of ego motion and scene change is a fundamental task for mobile robots. Optical flow information can be employed to estimate motion in the surroundings. Recently, unsupervised optical flow estimation has become a research hotspot. However, unsupervised approaches are often easy to be unreliable on partially occluded or texture-less regions. To deal with this problem, we propose CoT-AMFlow in this paper, an unsupervised optical flow estimation approach. In terms of the network architecture, we develop an adaptive modulation network that employs two novel module types, flow modulation modules (FMMs) and cost volume modulation modules (CMMs), to remove outliers in challenging regions. As for the training paradigm, we adopt a co-teaching strategy, where two networks simultaneously teach each other about challenging regions to further improve accuracy. Experimental results on the MPI Sintel, KITTI Flow and Middlebury Flow benchmarks demonstrate that our CoT-AMFlow outperforms all other state-of-the-art unsupervised approaches, while still running in real time. Our project page is available at https://sites.google.com/view/cot-amflow.
127 - Hengli Wang , Rui Fan , Ming Liu 2021
Convolutional neural network (CNN)-based stereo matching approaches generally require a dense cost volume (DCV) for disparity estimation. However, generating such cost volumes is computationally-intensive and memory-consuming, hindering CNN training and inference efficiency. To address this problem, we propose SCV-Stereo, a novel CNN architecture, capable of learning dense stereo matching from sparse cost volume (SCV) representations. Our inspiration is derived from the fact that DCV representations are somewhat redundant and can be replaced with SCV representations. Benefiting from these SCV representations, our SCV-Stereo can update disparity estimations in an iterative fashion for accurate and efficient stereo matching. Extensive experiments carried out on the KITTI Stereo benchmarks demonstrate that our SCV-Stereo can significantly minimize the trade-off between accuracy and efficiency for stereo matching. Our project page is https://sites.google.com/view/scv-stereo.
89 - Hengli Wang , Rui Fan , Peide Cai 2021
Supervised learning with deep convolutional neural networks (DCNNs) has seen huge adoption in stereo matching. However, the acquisition of large-scale datasets with well-labeled ground truth is cumbersome and labor-intensive, making supervised learni ng-based approaches often hard to implement in practice. To overcome this drawback, we propose a robust and effective self-supervised stereo matching approach, consisting of a pyramid voting module (PVM) and a novel DCNN architecture, referred to as OptStereo. Specifically, our OptStereo first builds multi-scale cost volumes, and then adopts a recurrent unit to iteratively update disparity estimations at high resolution; while our PVM can generate reliable semi-dense disparity images, which can be employed to supervise OptStereo training. Furthermore, we publish the HKUST-Drive dataset, a large-scale synthetic stereo dataset, collected under different illumination and weather conditions for research purposes. Extensive experimental results demonstrate the effectiveness and efficiency of our self-supervised stereo matching approach on the KITTI Stereo benchmarks and our HKUST-Drive dataset. PVStereo, our best-performing implementation, greatly outperforms all other state-of-the-art self-supervised stereo matching approaches. Our project page is available at sites.google.com/view/pvstereo.
The cost aggregation strategy shows a crucial role in learning-based stereo matching tasks, where 3D convolutional filters obtain state of the art but require intensive computation resources, while 2D operations need less GPU memory but are sensitive to domain shift. In this paper, we decouple the 4D cubic cost volume used by 3D convolutional filters into sequential cost maps along the direction of disparity instead of dealing with it at once by exploiting a recurrent cost aggregation strategy. Furthermore, a novel recurrent module, Stacked Recurrent Hourglass (SRH), is proposed to process each cost map. Our hourglass network is constructed based on Gated Recurrent Units (GRUs) and down/upsampling layers, which provides GRUs larger receptive fields. Then two hourglass networks are stacked together, while multi-scale information is processed by skip connections to enhance the performance of the pipeline in textureless areas. The proposed architecture is implemented in an end-to-end pipeline and evaluated on public datasets, which reduces GPU memory consumption by up to 56.1% compared with PSMNet using stacked hourglass 3D CNNs without the degradation of accuracy. Then, we further demonstrate the scalability of the proposed method on several high-resolution pairs, while previously learned approaches often fail due to the memory constraint. The code is released at url{https://github.com/hongzhidu/SRHNet}.
75 - Yaoyu Hu , Wenshan Wang , Huai Yu 2021
Stereo reconstruction models trained on small images do not generalize well to high-resolution data. Training a model on high-resolution image size faces difficulties of data availability and is often infeasible due to limited computing resources. In this work, we present the Occlusion-aware Recurrent binocular Stereo matching (ORStereo), which deals with these issues by only training on available low disparity range stereo images. ORStereo generalizes to unseen high-resolution images with large disparity ranges by formulating the task as residual updates and refinements of an initial prediction. ORStereo is trained on images with disparity ranges limited to 256 pixels, yet it can operate 4K-resolution input with over 1000 disparities using limited GPU memory. We test the models capability on both synthetic and real-world high-resolution images. Experimental results demonstrate that ORStereo achieves comparable performance on 4K-resolution images compared to state-of-the-art methods trained on large disparity ranges. Compared to other methods that are only trained on low-resolution images, our method is 70% more accurate on 4K-resolution images.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا