No Arabic abstract
Several deep learning methods have been proposed for completing partial data from shape acquisition setups, i.e., filling the regions that were missing in the shape. These methods, however, only complete the partial shape with a single output, ignoring the ambiguity when reasoning the missing geometry. Hence, we pose a multi-modal shape completion problem, in which we seek to complete the partial shape with multiple outputs by learning a one-to-many mapping. We develop the first multimodal shape completion method that completes the partial shape via conditional generative modeling, without requiring paired training data. Our approach distills the ambiguity by conditioning the completion on a learned multimodal distribution of possible results. We extensively evaluate the approach on several datasets that contain varying forms of shape incompleteness, and compare among several baseline methods and variants of our methods qualitatively and quantitatively, demonstrating the merit of our method in completing partial shapes with both diversity and quality.
When trained on multimodal image datasets, normal Generative Adversarial Networks (GANs) are usually outperformed by class-conditional GANs and ensemble GANs, but conditional GANs is restricted to labeled datasets and ensemble GANs lack efficiency. We propose a novel GAN variant called virtual conditional GAN (vcGAN) which is not only an ensemble GAN with multiple generative paths while adding almost zero network parameters, but also a conditional GAN that can be trained on unlabeled datasets without explicit clustering steps or objectives other than the adversary loss. Inside the vcGANs generator, a learnable ``analog-to-digital converter (ADC) module maps a slice of the inputted multivariate Gaussian noise to discrete/digital noise (virtual label), according to which a selector selects the corresponding generative path to produce the sample. All the generative paths share the same decoder network while in each path the decoder network is fed with a concatenation of a different pre-computed amplified one-hot vector and the inputted Gaussian noise. We conducted a lot of experiments on several balanced/imbalanced image datasets to demonstrate that vcGAN converges faster and achieves improved Frechet Inception Distance (FID). In addition, we show the training byproduct that the ADC in vcGAN learned the categorical probability of each mode and that each generative path generates samples of specific mode, which enables class-conditional sampling. Codes are available at url{https://github.com/annonnymmouss/vcgan}
The mood of a text and the intention of the writer can be reflected in the typeface. However, in designing a typeface, it is difficult to keep the style of various characters consistent, especially for languages with lots of morphological variations such as Chinese. In this paper, we propose a Typeface Completion Network (TCN) which takes one character as an input, and automatically completes the entire set of characters in the same style as the input characters. Unlike existing models proposed for image-to-image translation, TCN embeds a character image into two separate vectors representing typeface and content. Combined with a reconstruction loss from the latent space, and with other various losses, TCN overcomes the inherent difficulty in designing a typeface. Also, compared to previous image-to-image translation models, TCN generates high quality character images of the same typeface with a much smaller number of model parameters. We validate our proposed model on the Chinese and English character datasets, which is paired data, and the CelebA dataset, which is unpaired data. In these datasets, TCN outperforms recently proposed state-of-the-art models for image-to-image translation. The source code of our model is available at https://github.com/yongqyu/TCN.
Numerous task-specific variants of conditional generative adversarial networks have been developed for image completion. Yet, a serious limitation remains that all existing algorithms tend to fail when handling large-scale missing regions. To overcome this challenge, we propose a generic new approach that bridges the gap between image-conditional and recent modulated unconditional generative architectures via co-modulation of both conditional and stochastic style representations. Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space. Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation. Code is available at https://github.com/zsyzzsoft/co-mod-gan.
A conditional Generative Adversarial Network allows for generating samples conditioned on certain external information. Being able to recover latent and conditional vectors from a condi- tional GAN can be potentially valuable in various applications, ranging from image manipulation for entertaining purposes to diagnosis of the neural networks for security purposes. In this work, we show that it is possible to recover both latent and conditional vectors from generated images given the generator of a conditional generative adversarial network. Such a recovery is not trivial due to the often multi-layered non-linearity of deep neural networks. Furthermore, the effect of such recovery applied on real natural images are investigated. We discovered that there exists a gap between the recovery performance on generated and real images, which we believe comes from the difference between generated data distribution and real data distribution. Experiments are conducted to evaluate the recovered conditional vectors and the reconstructed images from these recovered vectors quantitatively and qualitatively, showing promising results.
Conditional Generative Adversarial Networks (cGANs) are generative models that can produce data samples ($x$) conditioned on both latent variables ($z$) and known auxiliary information ($c$). We propose the Bidirectional cGAN (BiCoGAN), which effectively disentangles $z$ and $c$ in the generation process and provides an encoder that learns inverse mappings from $x$ to both $z$ and $c$, trained jointly with the generator and the discriminator. We present crucial techniques for training BiCoGANs, which involve an extrinsic factor loss along with an associated dynamically-tuned importance weight. As compared to other encoder-based cGANs, BiCoGANs encode $c$ more accurately, and utilize $z$ and $c$ more effectively and in a more disentangled way to generate samples.