ﻻ يوجد ملخص باللغة العربية
We address the task of indoor scene generation by generating a sequence of objects, along with their locations and orientations conditioned on a room layout. Large-scale indoor scene datasets allow us to extract patterns from user-designed indoor scenes, and generate new scenes based on these patterns. Existing methods rely on the 2D or 3D appearance of these scenes in addition to object positions, and make assumptions about the possible relations between objects. In contrast, we do not use any appearance information, and implicitly learn object relations using the self-attention mechanism of transformers. We show that our model design leads to faster scene generation with similar or improved levels of realism compared to previous methods. Our method is also flexible, as it can be conditioned not only on the room layout but also on text descriptions of the room, using only the cross-attention mechanism of transformers. Our user study shows that our generated scenes are preferred to the state-of-the-art FastSynth scenes 53.9% and 56.7% of the time for bedroom and living room scenes, respectively. At the same time, we generate a scene in 1.48 seconds on average, 20% faster than FastSynth.
We present a method for creating 3D indoor scenes with a generative model learned from a collection of semantic-segmented depth images captured from different unknown scenes. Given a room with a specified size, our method automatically generates 3D o
Automatic surgical instruction generation is a prerequisite towards intra-operative context-aware surgical assistance. However, generating instructions from surgical scenes is challenging, as it requires jointly understanding the surgical activity of
A set is an unordered collection of unique elements--and yet many machine learning models that generate sets impose an implicit or explicit ordering. Since model performance can depend on the choice of order, any particular ordering can lead to sub-o
We tackle the problem of automatically reconstructing a complete 3D model of a scene from a single RGB image. This challenging task requires inferring the shape of both visible and occluded surfaces. Our approach utilizes viewer-centered, multi-layer
Accurate perception of the surrounding scene is helpful for robots to make reasonable judgments and behaviours. Therefore, developing effective scene representation and recognition methods are of significant importance in robotics. Currently, a large