ﻻ يوجد ملخص باللغة العربية
Image completion has achieved significant progress due to advances in generative adversarial networks (GANs). Albeit natural-looking, the synthesized contents still lack details, especially for scenes with complex structures or images with large holes. This is because there exists a gap between low-level reconstruction loss and high-level adversarial loss. To address this issue, we introduce a perceptual network to provide mid-level guidance, which measures the semantical similarity between the synthesized and original contents in a similarity-enhanced space. We conduct a detailed analysis on the effects of different losses and different levels of perceptual features in image completion, showing that there exist complementarity between adversarial training and perceptual features. By combining them together, our model can achieve nearly seamless fusion results in an end-to-end manner. Moreover, we design an effective lightweight generator architecture, which can achieve effective image inpainting with far less parameters. Evaluated on CelebA Face and Paris StreetView dataset, our proposed method significantly outperforms existing methods.
Learning subtle yet discriminative features (e.g., beak and eyes for a bird) plays a significant role in fine-grained image recognition. Existing attention-based approaches localize and amplify significant parts to learn fine-grained details, which o
This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a near
Significant progress has been made in recent years in image captioning, an active topic in the fields of vision and language. However, existing methods tend to yield overly general captions and consist of some of the most frequent words/phrases, resu
Existing image-to-image transformation approaches primarily focus on synthesizing visually pleasing data. Generating images with correct identity labels is challenging yet much less explored. It is even more challenging to deal with image transformat
We propose an unsupervised multi-conditional image generation pipeline: cFineGAN, that can generate an image conditioned on two input images such that the generated image preserves the texture of one and the shape of the other input. To achieve this