بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

StrobeNet: Category-Level Multiview Reconstruction of Articulated Objects

137 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ge Zhang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ge Zhang - Or Litany - Srinath Sridhar

الرؤية الحاسوبية وتمييز الأنماط الهندسة الحسابية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present StrobeNet, a method for category-level 3D reconstruction of articulating objects from one or more unposed RGB images. Reconstructing general articulating object categories % has important applications, but is challenging since objects can have wide variation in shape, articulation, appearance and topology. We address this by building on the idea of category-level articulation canonicalization -- mapping observations to a canonical articulation which enables correspondence-free multiview aggregation. Our end-to-end trainable neural network estimates feature-enriched canonical 3D point clouds, articulation joints, and part segmentation from one or more unposed images of an object. These intermediate estimates are used to generate a final implicit 3D reconstruction.Our approach reconstructs objects even when they are observed in different articulations in images with large baselines, and animation of reconstructed shapes. Quantitative and qualitative evaluations on different object categories show that our method is able to achieve high reconstruction accuracy, especially as more views are added.

قيم البحث

75 - Yijia Weng , He Wang , Qiang Zhou 2021

In this work, we tackle the problem of category-level online pose tracking of objects from point cloud sequences. For the first time, we propose a unified framework that can handle 9DoF pose tracking for novel rigid object instances as well as per-pa rt pose tracking for articulated objects from known categories. Here the 9DoF pose, comprising 6D pose and 3D size, is equivalent to a 3D amodal bounding box representation with free 6D pose. Given the depth point cloud at the current frame and the estimated pose from the last frame, our novel end-to-end pipeline learns to accurately update the pose. Our pipeline is composed of three modules: 1) a pose canonicalization module that normalizes the pose of the input depth point cloud; 2) RotationNet, a module that directly regresses small interframe delta rotations; and 3) CoordinateNet, a module that predicts the normalized coordinates and segmentation, enabling analytical computation of the 3D size and translation. Leveraging the small pose regime in the pose-canonicalized point clouds, our method integrates the best of both worlds by combining dense coordinate prediction and direct rotation regression, thus yielding an end-to-end differentiable pipeline optimized for 9DoF pose accuracy (without using non-differentiable RANSAC). Our extensive experiments demonstrate that our method achieves new state-of-the-art performance on category-level rigid object pose (NOCS-REAL275) and articulated object pose benchmarks (SAPIEN , BMVC) at the fastest FPS ~12.

الرؤية الحاسوبية وتمييز الأنماط

Multiview Aggregation for Learning Category-Specific Shape Reconstruction

244 - Srinath Sridhar , Davis Rempe , Julien Valentin 2019

We investigate the problem of learning category-specific 3D shape reconstruction from a variable number of RGB views of previously unobserved object instances. Most approaches for multiview shape reconstruction operate on sparse shape representations , or assume a fixed number of views. We present a method that can estimate dense 3D shape, and aggregate shape across multiple and varying number of input views. Given a single input view of an object instance, we propose a representation that encodes the dense shape of the visible object surface as well as the surface behind line of sight occluded by the visible surface. When multiple input views are available, the shape representation is designed to be aggregated into a single 3D shape using an inexpensive union operation. We train a 2D CNN to learn to predict this representation from a variable number of views (1 or more). We further aggregate multiview information by using permutation equivariant layers that promote order-agnostic view information exchange at the feature level. Experiments show that our approach is able to produce dense 3D reconstructions of objects that improve in quality as more views are added.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

380 - Jeremy Reizenstein , Roman Shapovalov , Philipp Henzler 2021

Traditional approaches for learning 3D object categories have been predominantly trained and evaluated on synthetic datasets due to the unavailability of real 3D-annotated category-centric data. Our main goal is to facilitate advances in this field b y collecting real-world data in a magnitude similar to the existing synthetic counterparts. The principal contribution of this work is thus a large-scale dataset, called Common Objects in 3D, with real multi-view images of object categories annotated with camera poses and ground truth 3D point clouds. The dataset contains a total of 1.5 million frames from nearly 19,000 videos capturing objects from 50 MS-COCO categories and, as such, it is significantly larger than alternatives both in terms of the number of categories and objects. We exploit this new dataset to conduct one of the first large-scale in-the-wild evaluations of several new-view-synthesis and category-centric 3D reconstruction methods. Finally, we contribute NerFormer - a novel neural rendering method that leverages the powerful Transformer to reconstruct an object given a small number of its views. The CO3D dataset is available at https://github.com/facebookresearch/co3d .

الرؤية الحاسوبية وتمييز الأنماط

Category-Level 6D Object Pose Estimation via Cascaded Relation and Recurrent Reconstruction Networks

234 - Jiaze Wang , Kai Chen , Qi Dou 2021

Category-level 6D pose estimation, aiming to predict the location and orientation of unseen object instances, is fundamental to many scenarios such as robotic manipulation and augmented reality, yet still remains unsolved. Precisely recovering instan ce 3D model in the canonical space and accurately matching it with the observation is an essential point when estimating 6D pose for unseen objects. In this paper, we achieve accurate category-level 6D pose estimation via cascaded relation and recurrent reconstruction networks. Specifically, a novel cascaded relation network is dedicated for advanced representation learning to explore the complex and informative relations among instance RGB image, instance point cloud and category shape prior. Furthermore, we design a recurrent reconstruction network for iterative residual refinement to progressively improve the reconstruction and correspondence estimations from coarse to fine. Finally, the instance 6D pose is obtained leveraging the estimated dense correspondences between the instance point cloud and the reconstructed 3D model in the canonical space. We have conducted extensive experiments on two well-acknowledged benchmarks of category-level 6D pose estimation, with significant performance improvement over existing approaches. On the representatively strict evaluation metrics of $3D_{75}$ and $5^{circ}2 cm$, our method exceeds the latest state-of-the-art SPD by $4.9%$ and $17.7%$ on the CAMERA25 dataset, and by $2.7%$ and $8.5%$ on the REAL275 dataset. Codes are available at https://wangjiaze.cn/projects/6DPoseEstimation.html.

الرؤية الحاسوبية وتمييز الأنماط

UMPNet: Universal Manipulation Policy Network for Articulated Objects

90 - Zhenjia Xu , Zhanpeng He , Shuran Song 2021

We introduce the Universal Manipulation Policy Network (UMPNet) -- a single image-based policy network that infers closed-loop action sequences for manipulating arbitrary articulated objects. To infer a wide range of action trajectories, the policy s upports 6DoF action representation and varying trajectory length. To handle a diverse set of objects, the policy learns from objects with different articulation structures and generalizes to unseen objects or categories. The policy is trained with self-guided exploration without any human demonstrations, scripted policy, or pre-defined goal conditions. To support effective multi-step interaction, we introduce a novel Arrow-of-Time action attribute that indicates whether an action will change the object state back to the past or forward into the future. With the Arrow-of-Time inference at each interaction step, the learned policy is able to select actions that consistently lead towards or away from a given state, thereby, enabling both effective state exploration and goal-conditioned manipulation. Video is available at https://youtu.be/KqlvcL9RqKM

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد العالي للعلوم التطبيقية والتكنولوجيا

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

StrobeNet: Category-Level Multiview Reconstruction of Articulated Objects

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً