Do you want to publish a course? Click here

Fully automatic structure from motion with a spline-based environment representation

48   0   0.0 ( 0 )
 Added by Zhirui Wang
 Publication date 2018
and research's language is English




Ask ChatGPT about the research

While the common environment representation in structure from motion is given by a sparse point cloud, the community has also investigated the use of lines to better enforce the inherent regularities in man-made surroundings. Following the potential of this idea, the present paper introduces a more flexible higher-order extension of points that provides a general model for structural edges in the environment, no matter if straight or curved. Our model relies on linked Bezier curves, the geometric intuition of which proves great benefits during parameter initialization and regularization. We present the first fully automatic pipeline that is able to generate spline-based representations without any human supervision. Besides a full graphical formulation of the problem, we introduce both geometric and photometric cues as well as higher-level concepts such overall curve visibility and viewing angle restrictions to automatically manage the correspondences in the graph. Results prove that curve-based structure from motion with splines is able to outperform state-of-the-art sparse feature-based methods, as well as to model curved edges in the environment.



rate research

Read More

Recently, many methods have been proposed for face reconstruction from multiple images, most of which involve fundamental principles of Shape from Shading and Structure from motion. However, a majority of the methods just generate discrete surface model of face. In this paper, B-spline Shape from Motion and Shading (BsSfMS) is proposed to reconstruct continuous B-spline surface for multi-view face images, according to an assumption that shading and motion information in the images contain 1st- and 0th-order derivative of B-spline face respectively. Face surface is expressed as a B-spline surface that can be reconstructed by optimizing B-spline control points. Therefore, normals and 3D feature points computed from shading and motion of images respectively are used as the 1st- and 0th- order derivative information, to be jointly applied in optimizing the B-spline face. Additionally, an IMLS (iterative multi-least-square) algorithm is proposed to handle the difficult control point optimization. Furthermore, synthetic samples and LFW dataset are introduced and conducted to verify the proposed approach, and the experimental results demonstrate the effectiveness with different poses, illuminations, expressions etc., even with wild images.
In this work we introduce Lifting Autoencoders, a generative 3D surface-based model of object categories. We bring together ideas from non-rigid structure from motion, image formation, and morphable models to learn a controllable, geometric model of 3D categories in an entirely unsupervised manner from an unstructured set of images. We exploit the 3D geometric nature of our model and use normal information to disentangle appearance into illumination, shading and albedo. We further use weak supervision to disentangle the non-rigid shape variability of human faces into identity and expression. We combine the 3D representation with a differentiable renderer to generate RGB images and append an adversarially trained refinement network to obtain sharp, photorealistic image reconstruction results. The learned generative model can be controlled in terms of interpretable geometry and appearance factors, allowing us to perform photorealistic image manipulation of identity, expression, 3D pose, and illumination properties.
236 - Chenyang Lei , Qifeng Chen 2019
We present a fully automatic approach to video colorization with self-regularization and diversity. Our model contains a colorization network for video frame colorization and a refinement network for spatiotemporal color refinement. Without any labeled data, both networks can be trained with self-regularized losses defined in bilateral and temporal space. The bilateral loss enforces color consistency between neighboring pixels in a bilateral space and the temporal loss imposes constraints between corresponding pixels in two nearby frames. While video colorization is a multi-modal problem, our method uses a perceptual loss with diversity to differentiate various modes in the solution space. Perceptual experiments demonstrate that our approach outperforms state-of-the-art approaches on fully automatic video colorization. The results are shown in the supplementary video at https://youtu.be/Y15uv2jnK-4
85 - Chen Kong , Simon Lucey 2019
Non-Rigid Structure from Motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from an ensemble of images with 2D correspondences. Current NRSfM algorithms are limited from two perspectives: (i) the number of images, and (ii) the type of shape variability they can handle. These difficulties stem from the inherent conflict between the condition of the system and the degrees of freedom needing to be modeled -- which has hampered its practical utility for many applications within vision. In this paper we propose a novel hierarchical sparse coding model for NRSFM which can overcome (i) and (ii) to such an extent, that NRSFM can be applied to problems in vision previously thought too ill posed. Our approach is realized in practice as the training of an unsupervised deep neural network (DNN) auto-encoder with a unique architecture that is able to disentangle pose from 3D structure. Using modern deep learning computational platforms allows us to solve NRSfM problems at an unprecedented scale and shape complexity. Our approach has no 3D supervision, relying solely on 2D point correspondences. Further, our approach is also able to handle missing/occluded 2D points without the need for matrix completion. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in some instances by an order of magnitude. We further propose a new quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstructability. We believe our work to be a significant advance over state-of-the-art in NRSFM.
Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes. Sketchformer effectively addresses multiple tasks: sketch classification, sketch based image retrieval (SBIR), and the reconstruction and interpolation of sketches. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks, when compared against baseline representations driven by LSTM sequence to sequence architectures: SketchRNN and derivatives. We show that sketch reconstruction and interpolation are improved significantly by the Sketchformer embedding for complex sketches with longer stroke sequences.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا