Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions


Abstract in English

Although current deep generative adversarial networks (GANs) could synthesize high-quality (HQ) images, discovering novel GAN encoders for image reconstruction is still favorable. When embedding images to latent space, existing GAN encoders work well for aligned images (such as the human face), but they do not adapt to more generalized GANs. To our knowledge, current state-of-the-art GAN encoders do not have a proper encoder to reconstruct high-fidelity images from most misaligned HQ synthesized images on different GANs. Their performances are limited, especially on non-aligned and real images. We propose a novel method (named MTV-TSA) to handle such problems. Creating multi-type latent vectors (MTV) from latent space and two-scale attentions (TSA) from images allows designing a set of encoders that can be adaptable to a variety of pre-trained GANs. We generalize two sets of loss functions to optimize the encoders. The designed encoders could make GANs reconstruct higher fidelity images from most synthesized HQ images. In addition, the proposed method can reconstruct real images well and process them based on learned attribute directions. The designed encoders have unified convolutional blocks and could match well in current GAN architectures (such as PGGAN, StyleGANs, and BigGAN) by fine-tuning the corresponding normalization layers and the last block. Such well-designed encoders can also be trained to converge more quickly.

Download