ﻻ يوجد ملخص باللغة العربية
The recent success of implicit neural scene representations has presented a viable new method for how we capture and store 3D scenes. Unlike conventional 3D representations, such as point clouds, which explicitly store scene properties in discrete, localized units, these implicit representations encode a scene in the weights of a neural network which can be queried at any coordinate to produce these same scene properties. Thus far, implicit representations have primarily been optimized to estimate only the appearance and/or 3D geometry information in a scene. We take the next step and demonstrate that an existing implicit representation (SRNs) is actually multi-modal; it can be further leveraged to perform per-point semantic segmentation while retaining its ability to represent appearance and geometry. To achieve this multi-modal behavior, we utilize a semi-supervised learning strategy atop the existing pre-trained scene representation. Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks in order to achieve dense 3D semantic segmentation. We explore two novel applications for this semantically aware implicit neural scene representation: 3D novel view and semantic label synthesis given only a single input RGB image or 2D label mask, as well as 3D interpolation of appearance and semantics.
The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation. Since the computational cost gen
Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for
In this paper, we study the semi-supervised semantic segmentation problem via exploring both labeled data and extra unlabeled data. We propose a novel consistency regularization approach, called cross pseudo supervision (CPS). Our approach imposes th
Scene understanding is an important capability for robots acting in unstructured environments. While most SLAM approaches provide a geometrical representation of the scene, a semantic map is necessary for more complex interactions with the surroundin
We address the challenging problem of semi-supervised learning in the context of multiple visual interpretations of the world by finding consensus in a graph of neural networks. Each graph node is a scene interpretation layer, while each edge is a de