ترغب بنشر مسار تعليمي؟ اضغط هنا

We introduce Multiresolution Deep Implicit Functions (MDIF), a hierarchical\nrepresentation that can recover fine geometry detail, while being able to\nperform global operations such as shape completion. Our model represents a\ncomplex 3D shape wit h a hierarchy of latent grids, which can be decoded into\ndifferent levels of detail and also achieve better accuracy. For shape\ncompletion, we propose latent grid dropout to simulate partial data in the\nlatent space and therefore defer the completing functionality to the decoder\nside. This along with our multires design significantly improves the shape\ncompletion quality under decoder-only latent optimization. To the best of our\nknowledge, MDIF is the first deep implicit function model that can at the same\ntime (1) represent different levels of detail and allow progressive decoding;\n(2) support both encoder-decoder inference and decoder-only latent\noptimization, and fulfill multiple applications; (3) perform detailed\ndecoder-only shape completion. Experiments demonstrate its superior performance\nagainst prior art in various 3D reconstruction tasks.\n
Sparse Neural Networks (NNs) can match the generalization of dense NNs using\na fraction of the compute/storage for inference, and also have the potential to\nenable efficient training. However, naively training unstructured sparse NNs\nfrom random initialization results in significantly worse generalization, with\nthe notable exception of Lottery Tickets (LTs) and Dynamic Sparse Training\n(DST). In this work, we attempt to answer: (1) why training unstructured sparse\nnetworks from random initialization performs poorly and; (2) what makes LTs and\nDST the exceptions? We show that sparse NNs have poor gradient flow at\ninitialization and propose a modified initialization for unstructured\nconnectivity. Furthermore, we find that DST methods significantly improve\ngradient flow during training over traditional sparse training methods.\nFinally, we show that LTs do not improve gradient flow, rather their success\nlies in re-learning the pruning solution they are derived from - however, this\ncomes at the cost of learning novel solutions.\n
We describe a novel approach for compressing truncated signed distance fields\n(TSDF) stored in 3D voxel grids, and their corresponding textures. To compress\nthe TSDF, our method relies on a block-based neural network architecture\ntrained end-to- end, achieving state-of-the-art rate-distortion trade-off. To\nprevent topological errors, we losslessly compress the signs of the TSDF, which\nalso upper bounds the reconstruction error by the voxel size. To compress the\ncorresponding texture, we designed a fast block-based UV parameterization,\ngenerating coherent texture maps that can be effectively compressed using\nexisting video compression algorithms. We demonstrate the performance of our\nalgorithms on two 4D performance capture datasets, reducing bitrate by 66% for\nthe same distortion, or alternatively reducing the distortion by 50% for the\nsame bitrate, compared to the state-of-the-art.\n
We propose a novel efficient and lightweight model for human pose estimation\nfrom a single image. Our model is designed to achieve competitive results at a\nfraction of the number of parameters and computational cost of various\nstate-of-the-art m ethods. To this end, we explicitly incorporate part-based\nstructural and geometric priors in a hierarchical prediction framework. At the\ncoarsest resolution, and in a manner similar to classical part-based\napproaches, we leverage the kinematic structure of the human body to propagate\nconvolutional feature updates between the keypoints or body parts. Unlike\nclassical approaches, we adopt end-to-end training to learn this geometric\nprior through feature updates from data. We then propagate the feature\nrepresentation at the coarsest resolution up the hierarchy to refine the\npredicted pose in a coarse-to-fine fashion. The final network effectively\nmodels the geometric prior and intuition within a lightweight deep neural\nnetwork, yielding state-of-the-art results for a model of this size on two\nstandard datasets, Leeds Sports Pose and MPII Human Pose.\n
Volumetric (4D) performance capture is fundamental for AR/VR content\ngeneration. Whereas previous work in 4D performance capture has shown\nimpressive results in studio settings, the technology is still far from being\naccessible to a typical cons umer who, at best, might own a single RGBD sensor.\nThus, in this work, we propose a method to synthesize free viewpoint renderings\nusing a single RGBD camera. The key insight is to leverage previously seen\n\
Motivated by augmented and virtual reality applications such as telepresence,\nthere has been a recent focus in real-time performance capture of humans under\nmotion. However, given the real-time constraint, these systems often suffer\nfrom artifac ts in geometry and texture such as holes and noise in the final\nrendering, poor lighting, and low-resolution textures. We take the novel\napproach to augment such real-time performance capture systems with a deep\narchitecture that takes a rendering from an arbitrary viewpoint, and jointly\nperforms completion, super resolution, and denoising of the imagery in\nreal-time. We call this approach neural (re-)rendering, and our live system\n\
We present SplineNets, a practical and novel approach for using conditioning\nin convolutional neural networks (CNNs). SplineNets are continuous\ngeneralizations of neural decision graphs, and they can dramatically reduce\nruntime complexity and co mputation costs of CNNs, while maintaining or even\nincreasing accuracy. Functions of SplineNets are both dynamic (i.e.,\nconditioned on the input) and hierarchical (i.e., conditioned on the\ncomputational path). SplineNets employ a unified loss function with a desired\nlevel of smoothness over both the network and decision parameters, while\nallowing for sparse activation of a subset of nodes for individual samples. In\nparticular, we embed infinitely many function weights (e.g. filters) on smooth,\nlow dimensional manifolds parameterized by compact B-splines, which are indexed\nby a position parameter. Instead of sampling from a categorical distribution to\npick a branch, samples choose a continuous position to pick a function weight.\nWe further show that by maximizing the mutual information between spline\npositions and class labels, the network can be optimally utilized and\nspecialized for classification tasks. Experiments show that our approach can\nsignificantly increase the accuracy of ResNets with negligible cost in speed,\nmatching the precision of a 110 level ResNet with a 32 level SplineNet.\n
Deep neural network models owe their representational power to the high\nnumber of learnable parameters. It is often infeasible to run these largely\nparametrized deep models in limited resource environments, like mobile phones.\nNetwork models emp loying conditional computing are able to reduce computational\nrequirements while achieving high representational power, with their ability to\nmodel hierarchies. We propose Conditional Information Gain Networks, which\nallow the feed forward deep neural networks to execute conditionally, skipping\nparts of the model based on the sample and the decision mechanisms inserted in\nthe architecture. These decision mechanisms are trained using cost functions\nbased on differentiable Information Gain, inspired by the training procedures\nof decision trees. These information gain based decision mechanisms are\ndifferentiable and can be trained end-to-end using a unified framework with a\ngeneral cost function, covering both classification and decision losses. We\ntest the effectiveness of the proposed method on MNIST and recently introduced\nFashion MNIST datasets and show that our information gain based conditional\nexecution approach can achieve better or comparable classification results\nusing significantly fewer parameters, compared to standard convolutional neural\nnetwork baselines.\n
In this paper, we present a novel and efficient architecture for addressing\ncomputer vision problems that use `Analysis by Synthesis'. Analysis by\nsynthesis involves the minimization of the reconstruction error which is\ntypically a non-convex fu nction of the latent target variables.\nState-of-the-art methods adopt a hybrid scheme where discriminatively trained\npredictors like Random Forests or Convolutional Neural Networks are used to\ninitialize local search algorithms. While these methods have been shown to\nproduce promising results, they often get stuck in local optima. Our method\ngoes beyond the conventional hybrid architecture by not only proposing multiple\naccurate initial solutions but by also defining a navigational structure over\nthe solution space that can be used for extremely efficient gradient-free local\nsearch. We demonstrate the efficacy of our approach on the challenging problem\nof RGB Camera Relocalization. To make the RGB camera relocalization problem\nparticularly challenging, we introduce a new dataset of 3D environments which\nare significantly larger than those found in other publicly-available datasets.\nOur experiments reveal that the proposed method is able to achieve\nstate-of-the-art camera relocalization results. We also demonstrate the\ngeneralizability of our approach on Hand Pose Estimation and Image Retrieval\ntasks.\n

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا