A Conditional Adversarial Network for Scene Flow Estimation

94 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ravi Kumar Thakur

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ravi Kumar Thakur - Snehasis Mukherjee

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The problem of Scene flow estimation in depth videos has been attracting attention of researchers of robot vision, due to its potential application in various areas of robotics. The conventional scene flow methods are difficult to use in reallife applications due to their long computational overhead. We propose a conditional adversarial network SceneFlowGAN for scene flow estimation. The proposed SceneFlowGAN uses loss function at two ends: both generator and descriptor ends. The proposed network is the first attempt to estimate scene flow using generative adversarial networks, and is able to estimate both the optical flow and disparity from the input stereo images simultaneously. The proposed method is experimented on a large RGB-D benchmark sceneflow dataset.

قيم البحث

104 - Bing Li , Cheng Zheng , Silvio Giancola 2021

We propose a novel scene flow estimation approach to capture and infer 3D motions from point clouds. Estimating 3D motions for point clouds is challenging, since a point cloud is unordered and its density is significantly non-uniform. Such unstructur ed data poses difficulties in matching corresponding points between point clouds, leading to inaccurate flow estimation. We propose a novel architecture named Sparse Convolution-Transformer Network (SCTN) that equips the sparse convolution with the transformer. Specifically, by leveraging the sparse convolution, SCTN transfers irregular point cloud into locally consistent flow features for estimating continuous and consistent motions within an object/local object part. We further propose to explicitly learn point relations using a point transformer module, different from exiting methods. We show that the learned relation-based contextual information is rich and helpful for matching corresponding points, benefiting scene flow estimation. In addition, a novel loss function is proposed to adaptively encourage flow consistency according to feature similarity. Extensive experiments demonstrate that our proposed approach achieves a new state of the art in scene flow estimation. Our approach achieves an error of 0.038 and 0.037 (EPE3D) on FlyingThings3D and KITTI Scene Flow respectively, which significantly outperforms previous methods by large margins.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Scene Flow from Point Clouds with or without Learning

364 - Jhony Kaesemodel Pontes , James Hays , 2020

Scene flow is the three-dimensional (3D) motion field of a scene. It provides information about the spatial arrangement and rate of change of objects in dynamic environments. Current learning-based approaches seek to estimate the scene flow directly from point clouds and have achieved state-of-the-art performance. However, supervised learning methods are inherently domain specific and require a large amount of labeled data. Annotation of scene flow on real-world point clouds is expensive and challenging, and the lack of such datasets has recently sparked interest in self-supervised learning methods. How to accurately and robustly learn scene flow representations without labeled real-world data is still an open problem. Here we present a simple and interpretable objective function to recover the scene flow from point clouds. We use the graph Laplacian of a point cloud to regularize the scene flow to be as-rigid-as-possible. Our proposed objective function can be used with or without learning---as a self-supervisory signal to learn scene flow representations, or as a non-learning-based method in which the scene flow is optimized during runtime. Our approach outperforms related works in many datasets. We also show the immediate applications of our proposed method for two applications: motion segmentation and point cloud densification.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

Lets Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation

145 - Tommaso Cavallari , Luca Bertinetto , Jishnu Mukhoti 2019

Many applications require a camera to be relocalised online, without expensive offline training on the target scene. Whilst both keyframe and sparse keypoint matching methods can be used online, the former often fail away from the training trajectory , and the latter can struggle in textureless regions. By contrast, scene coordinate regression (SCoRe) methods generalise to novel poses and can leverage dense correspondences to improve robustness, and recent work has shown how to adapt SCoRe forests between scenes, allowing their state-of-the-art performance to be leveraged online. However, because they use features hand-crafted for indoor use, they do not generalise well to harder outdoor scenes. Whilst replacing the forest with a neural network and learning suitable features for outdoor use is possible, the techniques used to adapt forests between scenes are unfortunately harder to transfer to a network context. In this paper, we address this by proposing a novel way of leveraging a network trained on one scene to predict points in another scene. Our approach replaces the appearance clustering performed by the branching structure of a regression forest with a two-step process that first uses the network to predict points in the original scene, and then uses these predicted points to look up clusters of points from the new scene. We show experimentally that our online approach achieves state-of-the-art performance on both the 7-Scenes and Cambridge Landmarks datasets, whilst running in under 300ms, making it highly effective in live scenarios.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

Behind the leaves -- Estimation of occluded grapevine berries with conditional generative adversarial networks

176 - Jana Kierdorf , Immanuel Weber , Anna Kicherer 2021

The need for accurate yield estimates for viticulture is becoming more important due to increasing competition in the wine market worldwide. One of the most promising methods to estimate the harvest is berry counting, as it can be approached non-dest ructively, and its process can be automated. In this article, we present a method that addresses the challenge of occluded berries with leaves to obtain a more accurate estimate of the number of berries that will enable a better estimate of the harvest. We use generative adversarial networks, a deep learning-based approach that generates a likely scenario behind the leaves exploiting learned patterns from images with non-occluded berries. Our experiments show that the estimate of the number of berries after applying our method is closer to the manually counted reference. In contrast to applying a factor to the berry count, our approach better adapts to local conditions by directly involving the appearance of the visible berries. Furthermore, we show that our approach can identify which areas in the image should be changed by adding new berries without explicitly requiring information about hidden areas.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

Scene Transformer: A unified multi-task model for behavior prediction and planning

155 - Jiquan Ngiam , Benjamin Caine , Vijay Vasudevan 2021

Predicting the future motion of multiple agents is necessary for planning in dynamic environments. This task is challenging for autonomous driving since agents (e.g., vehicles and pedestrians) and their associated behaviors may be diverse and influen ce each other. Most prior work has focused on first predicting independent futures for each agent based on all past motion, and then planning against these independent predictions. However, planning against fixed predictions can suffer from the inability to represent the future interaction possibilities between different agents, leading to sub-optimal planning. In this work, we formulate a model for predicting the behavior of all agents jointly in real-world driving environments in a unified manner. Inspired by recent language modeling approaches, we use a masking strategy as the query to our model, enabling one to invoke a single model to predict agent behavior in many ways, such as potentially conditioned on the goal or full future trajectory of the autonomous vehicle or the behavior of other agents in the environment. Our model architecture fuses heterogeneous world state in a unified Transformer architecture by employing attention across road elements, agent interactions and time steps. We evaluate our approach on autonomous driving datasets for behavior prediction, and achieve state-of-the-art performance. Our work demonstrates that formulating the problem of behavior prediction in a unified architecture with a masking strategy may allow us to have a single model that can perform multiple motion prediction and planning related tasks effectively.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات