ﻻ يوجد ملخص باللغة العربية
Estimating geometric elements such as depth, camera motion, and optical flow from images is an important part of the robots visual perception. We use a joint self-supervised method to estimate the three geometric elements. Depth network, optical flow network and camera motion network are independent of each other but are jointly optimized during training phase. Compared with independent training, joint training can make full use of the geometric relationship between geometric elements and provide dynamic and static information of the scene. In this paper, we improve the joint self-supervision method from three aspects: network structure, dynamic object segmentation, and geometric constraints. In terms of network structure, we apply the attention mechanism to the camera motion network, which helps to take advantage of the similarity of camera movement between frames. And according to attention mechanism in Transformer, we propose a plug-and-play convolutional attention module. In terms of dynamic object, according to the different influences of dynamic objects in the optical flow self-supervised framework and the depth-pose self-supervised framework, we propose a threshold algorithm to detect dynamic regions, and mask that in the loss function respectively. In terms of geometric constraints, we use traditional methods to estimate the fundamental matrix from the corresponding points to constrain the camera motion network. We demonstrate the effectiveness of our method on the KITTI dataset. Compared with other joint self-supervised methods, our method achieves state-of-the-art performance in the estimation of pose and optical flow, and the depth estimation has also achieved competitive results. Code will be available https://github.com/jianfenglihg/Unsupervised_geometry.
Recent work has shown that CNN-based depth and ego-motion estimators can be learned using unlabelled monocular videos. However, the performance is limited by unidentified moving objects that violate the underlying static scene assumption in geometric
A new unsupervised learning method of depth and ego-motion using multiple masks from monocular video is proposed in this paper. The depth estimation network and the ego-motion estimation network are trained according to the constraints of depth and e
We propose a semantics-driven unsupervised learning approach for monocular depth and ego-motion estimation from videos in this paper. Recent unsupervised learning methods employ photometric errors between synthetic view and actual image as a supervis
In this paper, we proposed an unsupervised learning method for estimating the optical flow between video frames, especially to solve the occlusion problem. Occlusion is caused by the movement of an object or the movement of the camera, defined as whe
Most of the deep-learning based depth and ego-motion networks have been designed for visible cameras. However, visible cameras heavily rely on the presence of an external light source. Therefore, it is challenging to use them under low-light conditio