ﻻ يوجد ملخص باللغة العربية
This paper investigates a novel task of generating texture images from perceptual descriptions. Previous work on texture generation focused on either synthesis from examples or generation from procedural models. Generating textures from perceptual attributes have not been well studied yet. Meanwhile, perceptual attributes, such as directionality, regularity and roughness are important factors for human observers to describe a texture. In this paper, we propose a joint deep network model that combines adversarial training and perceptual feature regression for texture generation, while only random noise and user-defined perceptual attributes are required as input. In this model, a preliminary trained convolutional neural network is essentially integrated with the adversarial framework, which can drive the generated textures to possess given perceptual attributes. An important aspect of the proposed model is that, if we change one of the input perceptual features, the corresponding appearance of the generated textures will also be changed. We design several experiments to validate the effectiveness of the proposed method. The results show that the proposed method can produce high quality texture images with desired perceptual properties.
Biological systems perceive the world by simultaneously processing high-dimensional inputs from modalities as diverse as vision, audition, touch, proprioception, etc. The perception models used in deep learning on the other hand are designed for indi
Recent advances in image synthesis enables one to translate images by learning the mapping between a source domain and a target domain. Existing methods tend to learn the distributions by training a model on a variety of datasets, with results evalua
The estimation of 3D human body pose and shape from a single image has been extensively studied in recent years. However, the texture generation problem has not been fully discussed. In this paper, we propose an end-to-end learning strategy to genera
In this paper we present an approach and a benchmark for visual reasoning in robotics applications, in particular small object grasping and manipulation. The approach and benchmark are focused on inferring object properties from visual and text data.
In this demo, we present VirtualConductor, a system that can generate conducting video from any given music and a single users image. First, a large-scale conductor motion dataset is collected and constructed. Then, we propose Audio Motion Correspond