Predicting the Physical Dynamics of Unseen 3D Objects

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Davis Rempe

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Davis Rempe - Srinath Sridhar - He Wang

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Machines that can predict the effect of physical interactions on the dynamics of previously unseen object instances are important for creating better robots and interactive virtual worlds. In this work, we focus on predicting the dynamics of 3D objects on a plane that have just been subjected to an impulsive force. In particular, we predict the changes in state - 3D position, rotation, velocities, and stability. Different from previous work, our approach can generalize dynamics predictions to object shapes and initial conditions that were unseen during training. Our method takes the 3D objects shape as a point cloud and its initial linear and angular velocities as input. We extract shape features and use a recurrent neural network to predict the full change in state at each time step. Our model can support training with data from both a physics engine or the real world. Experiments show that we can accurately predict the changes in state for unseen object geometries and initial conditions.

قيم البحث

130 - Davis Rempe , Srinath Sridhar , He Wang 2019

Humans have a remarkable ability to predict the effect of physical interactions on the dynamics of objects. Endowing machines with this ability would allow important applications in areas like robotics and autonomous vehicles. In this work, we focus on predicting the dynamics of 3D rigid objects, in particular an objects final resting position and total rotation when subjected to an impulsive force. Different from previous work, our approach is capable of generalizing to unseen object shapes - an important requirement for real-world applications. To achieve this, we represent object shape as a 3D point cloud that is used as input to a neural network, making our approach agnostic to appearance variation. The design of our network is informed by an understanding of physical laws. We train our model with data from a physics engine that simulates the dynamics of a large number of shapes. Experiments show that we can accurately predict the resting position and total rotation for unseen object geometries.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Unsupervised Discovery of 3D Physical Objects from Video

226 - Yilun Du , Kevin Smith , Tomer Ulman 2020

We study the problem of unsupervised physical object discovery. While existing frameworks aim to decompose scenes into 2D segments based off each objects appearance, we explore how physics, especially object interactions, facilitates disentangling of 3D geometry and position of objects from video, in an unsupervised manner. Drawing inspiration from developmental psychology, our Physical Object Discovery Network (POD-Net) uses both multi-scale pixel cues and physical motion cues to accurately segment observable and partially occluded objects of varying sizes, and infer properties of those objects. Our model reliably segments objects on both synthetic and real scenes. The discovered object properties can also be used to reason about physical events.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

HyperFlow: Representing 3D Objects as Surfaces

97 - Przemys{l}aw Spurek , Maciej Zik{e}ba , Jacek Tabor 2020

In this work, we present HyperFlow - a novel generative model that leverages hypernetworks to create continuous 3D object representations in a form of lightweight surfaces (meshes), directly out of point clouds. Efficient object representations are e ssential for many computer vision applications, including robotic manipulation and autonomous driving. However, creating those representations is often cumbersome, because it requires processing unordered sets of point clouds. Therefore, it is either computationally expensive, due to additional optimization constraints such as permutation invariance, or leads to quantization losses introduced by binning point clouds into discrete voxels. Inspired by mesh-based representations of objects used in computer graphics, we postulate a fundamentally different approach and represent 3D objects as a family of surfaces. To that end, we devise a generative model that uses a hypernetwork to return the weights of a Continuous Normalizing Flows (CNF) target network. The goal of this target network is to map points from a probability distribution into a 3D mesh. To avoid numerical instability of the CNF on compact support distributions, we propose a new Spherical Log-Normal function which models density of 3D points around object surfaces mimicking noise introduced by 3D capturing devices. As a result, we obtain continuous mesh-based object representations that yield better qualitative results than competing approaches, while reducing training time by over an order of magnitude.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning

149 - Soravit Changpinyo , Wei-Lun Chao , Fei Sha 2016

Leveraging class semantic descriptions and examples of known objects, zero-shot learning makes it possible to train a recognition model for an object class whose examples are not available. In this paper, we propose a novel zero-shot learning model t hat takes advantage of clustering structures in the semantic embedding space. The key idea is to impose the structural constraint that semantic representations must be predictive of the locations of their corresponding visual exemplars. To this end, this reduces to training multiple kernel-based regressors from semantic representation-exemplar pairs from labeled data of the seen object categories. Despite its simplicity, our approach significantly outperforms existing zero-shot learning methods on standard benchmark datasets, including the ImageNet dataset with more than 20,000 unseen categories.

الرؤية الحاسوبية وتمييز الأنماط

Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories

107 - Tiange Luo , Kaichun Mo , Zhiao Huang 2020

We address the problem of discovering 3D parts for objects in unseen categories. Being able to learn the geometry prior of parts and transfer this prior to unseen categories pose fundamental challenges on data-driven shape segmentation approaches. Fo rmulated as a contextual bandit problem, we propose a learning-based agglomerative clustering framework which learns a grouping policy to progressively group small part proposals into bigger ones in a bottom-up fashion. At the core of our approach is to restrict the local context for extracting part-level features, which encourages the generalizability to unseen categories. On the large-scale fine-grained 3D part dataset, PartNet, we demonstrate that our method can transfer knowledge of parts learned from 3 training categories to 21 unseen testing categories without seeing any annotated samples. Quantitative comparisons against four shape segmentation baselines shows that our approach achieve the state-of-the-art performance.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي