ترغب بنشر مسار تعليمي؟ اضغط هنا

Novel View Synthesis for Large-scale Scene using Adversarial Loss

68   0   0.0 ( 0 )
 نشر من قبل Xiaochuan Yin
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Novel view synthesis aims to synthesize new images from different viewpoints of given images. Most of previous works focus on generating novel views of certain objects with a fixed background. However, for some applications, such as virtual reality or robotic manipulations, large changes in background may occur due to the egomotion of the camera. Generated images of a large-scale environment from novel views may be distorted if the structure of the environment is not considered. In this work, we propose a novel fully convolutional network, that can take advantage of the structural information explicitly by incorporating the inverse depth features. The inverse depth features are obtained from CNNs trained with sparse labeled depth values. This framework can easily fuse multiple images from different viewpoints. To fill the missing textures in the generated image, adversarial loss is applied, which can also improve the overall image quality. Our method is evaluated on the KITTI dataset. The results show that our method can generate novel views of large-scale scene without distortion. The effectiveness of our approach is demonstrated through qualitative and quantitative evaluation.



قيم البحث

اقرأ أيضاً

We present VUNet, a novel view(VU) synthesis method for mobile robots in dynamic environments, and its application to the estimation of future traversability. Our method predicts future images for given virtual robot velocity commands using only RGB images at previous and current time steps. The future images result from applying two types of image changes to the previous and current images: 1) changes caused by different camera pose, and 2) changes due to the motion of the dynamic obstacles. We learn to predict these two types of changes disjointly using two novel network architectures, SNet and DNet. We combine SNet and DNet to synthesize future images that we pass to our previously presented method GONet to estimate the traversable areas around the robot. Our quantitative and qualitative evaluation indicate that our approach for view synthesis predicts accurate future images in both static and dynamic environments. We also show that these virtual images can be used to estimate future traversability correctly. We apply our view synthesis-based traversability estimation method to two applications for assisted teleoperation.
Recent approaches to render photorealistic views from a limited set of photographs have pushed the boundaries of our interactions with pictures of static scenes. The ability to recreate moments, that is, time-varying sequences, is perhaps an even mor e interesting scenario, but it remains largely unsolved. We introduce DCT-NeRF, a coordinatebased neural representation for dynamic scenes. DCTNeRF learns smooth and stable trajectories over the input sequence for each point in space. This allows us to enforce consistency between any two frames in the sequence, which results in high quality reconstruction, particularly in dynamic regions.
PatchMatch based Multi-view Stereo (MVS) algorithms have achieved great success in large-scale scene reconstruction tasks. However, reconstruction of texture-less planes often fails as similarity measurement methods may become ineffective on these re gions. Thus, a new plane hypothesis inference strategy is proposed to handle the above issue. The procedure consists of two steps: First, multiple plane hypotheses are generated using filtered initial depth maps on regions that are not successfully recovered; Second, depth hypotheses are selected using Markov Random Field (MRF). The strategy can significantly improve the completeness of reconstruction results with only acceptable computing time increasing. Besides, a new acceleration scheme similar to dilated convolution can speed up the depth map estimating process with only a slight influence on the reconstruction. We integrated the above ideas into a new MVS pipeline, Plane Hypothesis Inference Multi-view Stereo (PHI-MVS). The result of PHI-MVS is validated on ETH3D public benchmarks, and it demonstrates competing performance against the state-of-the-art.
Multi-View Stereo (MVS) is a core task in 3D computer vision. With the surge of novel deep learning methods, learned MVS has surpassed the accuracy of classical approaches, but still relies on building a memory intensive dense cost volume. Novel View Synthesis (NVS) is a parallel line of research and has recently seen an increase in popularity with Neural Radiance Field (NeRF) models, which optimize a per scene radiance field. However, NeRF methods do not generalize to novel scenes and are slow to train and test. We propose to bridge the gap between these two methodologies with a novel network that can recover 3D scene geometry as a distance function, together with high-resolution color images. Our method uses only a sparse set of images as input and can generalize well to novel scenes. Additionally, we propose a coarse-to-fine sphere tracing approach in order to significantly increase speed. We show on various datasets that our method reaches comparable accuracy to per-scene optimized methods while being able to generalize and running significantly faster.
Content creation, central to applications such as virtual reality, can be a tedious and time-consuming. Recent image synthesis methods simplify this task by offering tools to generate new views from as little as a single input image, or by converting a semantic map into a photorealistic image. We propose to push the envelope further, and introduce Generative View Synthesis (GVS), which can synthesize multiple photorealistic views of a scene given a single semantic map. We show that the sequential application of existing techniques, e.g., semantics-to-image translation followed by monocular view synthesis, fail at capturing the scenes structure. In contrast, we solve the semantics-to-image translation in concert with the estimation of the 3D layout of the scene, thus producing geometrically consistent novel views that preserve semantic structures. We first lift the input 2D semantic map onto a 3D layered representation of the scene in feature space, thereby preserving the semantic labels of 3D geometric structures. We then project the layered features onto the target views to generate the final novel-view images. We verify the strengths of our method and compare it with several advanced baselines on three different datasets. Our approach also allows for style manipulation and image editing operations, such as the addition or removal of objects, with simple manipulations of the input style images and semantic maps respectively. Visit the project page at https://gvsnet.github.io.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا