أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Zhelun Shen

CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching

84 - Zhelun Shen , Yuchao Dai , Zhibo Rao 2021

Recently, the ever-increasing capacity of large-scale annotated datasets has led to profound progress in stereo matching. However, most of these successes are limited to a specific dataset and cannot generalize well to other datasets. The main diffic ulties lie in the large domain differences and unbalanced disparity distribution across a variety of datasets, which greatly limit the real-world applicability of current deep stereo matching models. In this paper, we propose CFNet, a Cascade and Fused cost volume based network to improve the robustness of the stereo matching network. First, we propose a fused cost volume representation to deal with the large domain difference. By fusing multiple low-resolution dense cost volumes to enlarge the receptive field, we can extract robust structural representations for initial disparity estimation. Second, we propose a cascade cost volume representation to alleviate the unbalanced disparity distribution. Specifically, we employ a variance-based uncertainty estimation to adaptively adjust the next stage disparity search space, in this way driving the network progressively prune out the space of unlikely correspondences. By iteratively narrowing down the disparity search space and improving the cost volume resolution, the disparity estimation is gradually refined in a coarse-to-fine manner. When trained on the same training images and evaluated on KITTI, ETH3D, and Middlebury datasets with the fixed model parameters and hyperparameters, our proposed method achieves the state-of-the-art overall performance and obtains the 1st place on the stereo task of Robust Vision Challenge 2020. The code will be available at https://github.com/gallenszl/CFNet.

الرؤية الحاسوبية وتمييز الأنماط

MSMD-Net: Deep Stereo Matching with Multi-scale and Multi-dimension Cost Volume

105 - Zhelun Shen , Yuchao Dai , Zhibo Rao 2020

Deep end-to-end learning based stereo matching methods have achieved great success as witnessed by the leaderboards across different benchmarking datasets (KITTI, Middlebury, ETH3D, etc). However, real scenarios not only require approaches to have st ate-of-the-art performance but also real-time speed and domain-across generalization, which cannot be satisfied by existing methods. In this paper, we propose MSMD-Net (Multi-Scale and Multi-Dimension) to construct multi-scale and multi-dimension cost volume. At the multi-scale level, we generate four 4D combination volumes at different scales and integrate them with an encoder-decoder process to predict an initial disparity estimation. At the multi-dimension level, we additionally construct a 3D warped correlation volume and use it to refine the initial disparity map with residual learning. These two dimensional cost volumes are complementary to each other and can boost the performance of disparity estimation. Additionally, we propose a switch training strategy to alleviate the overfitting issue appeared in the pre-training process and further improve the generalization ability and accuracy of final disparity estimation. Our proposed method was evaluated on several benchmark datasets and ranked first on KITTI 2012 leaderboard and second on KITTI 2015 leaderboard as of September 9. In addition, our method shows strong domain-across generalization and outperforms best prior work by a noteworthy margin with three or even five times faster speed. The code of MSMD-Net is available at https://github.com/gallenszl/MSMD-Net.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد