ﻻ يوجد ملخص باللغة العربية
Recently, learning-based approaches for 3D model reconstruction have attracted attention owing to its modern applications such as Extended Reality(XR), robotics and self-driving cars. Several approaches presented good performance on reconstructing 3D shapes by learning solely from images, i.e., without using 3D models in training. Challenges, however, remain in texture generation due to the gap between 2D and 3D modals. In previous work, the grid sampling mechanism from Spatial Transformer Networks was adopted to sample color from an input image to formulate texture. Despite its success, the existing framework has limitations on searching scope in sampling, resulting in flaws in generated texture and consequentially on rendered 3D models. In this paper, to solve that issue, we present a novel sampling algorithm by optimizing the gradient of predicted coordinates based on the variance on the sampling image. Taking into account the semantics of the image, we adopt Frechet Inception Distance (FID) to form a loss function in learning, which helps bridging the gap between rendered images and input images. As a result, we greatly improve generated texture. Furthermore, to optimize 3D shape reconstruction and to accelerate convergence at training, we adopt part segmentation and template learning in our model. Without any 3D supervision in learning, and with only a collection of single-view 2D images, the shape and texture learned by our model outperform those from previous work. We demonstrate the performance with experimental results on a publically available dataset.
3D reconstruction from a single RGB image is a challenging problem in computer vision. Previous methods are usually solely data-driven, which lead to inaccurate 3D shape recovery and limited generalization capability. In this work, we focus on object
Deep learning-based object reconstruction algorithms have shown remarkable improvements over classical methods. However, supervised learning based methods perform poorly when the training data and the test data have different distributions. Indeed, m
While single-view 3D reconstruction has made significant progress benefiting from deep shape representations in recent years, garment reconstruction is still not solved well due to open surfaces, diverse topologies and complex geometric details. In t
Convolutional networks for single-view object reconstruction have shown impressive performance and have become a popular subject of research. All existing techniques are united by the idea of having an encoder-decoder network that performs non-trivia
Recovering the 3D structure of an object from a single image is a challenging task due to its ill-posed nature. One approach is to utilize the plentiful photos of the same object category to learn a strong 3D shape prior for the object. This approach