No Arabic abstract
Though generative adversarial networks (GANs) areprominent models to generate realistic and crisp images,they often encounter the mode collapse problems and arehard to train, which comes from approximating the intrinsicdiscontinuous distribution transform map with continuousDNNs. The recently proposed AE-OT model addresses thisproblem by explicitly computing the discontinuous distribu-tion transform map through solving a semi-discrete optimaltransport (OT) map in the latent space of the autoencoder.However the generated images are blurry. In this paper, wepropose the AE-OT-GAN model to utilize the advantages ofthe both models: generate high quality images and at thesame time overcome the mode collapse/mixture problems.Specifically, we first faithfully embed the low dimensionalimage manifold into the latent space by training an autoen-coder (AE). Then we compute the optimal transport (OT)map that pushes forward the uniform distribution to the la-tent distribution supported on the latent manifold. Finally,our GAN model is trained to generate high quality imagesfrom the latent distribution, the distribution transform mapfrom which to the empirical data distribution will be con-tinuous. The paired data between the latent code and thereal images gives us further constriction about the generator.Experiments on simple MNIST dataset and complex datasetslike Cifar-10 and CelebA show the efficacy and efficiency ofour proposed method.
Recent successes in Generative Adversarial Networks (GAN) have affirmed the importance of using more data in GAN training. Yet it is expensive to collect data in many domains such as medical applications. Data Augmentation (DA) has been applied in these applications. In this work, we first argue that the classical DA approach could mislead the generator to learn the distribution of the augmented data, which could be different from that of the original data. We then propose a principled framework, termed Data Augmentation Optimized for GAN (DAG), to enable the use of augmented data in GAN training to improve the learning of the original distribution. We provide theoretical analysis to show that using our proposed DAG aligns with the original GAN in minimizing the Jensen-Shannon (JS) divergence between the original distribution and model distribution. Importantly, the proposed DAG effectively leverages the augmented data to improve the learning of discriminator and generator. We conduct experiments to apply DAG to different GAN models: unconditional GAN, conditional GAN, self-supervised GAN and CycleGAN using datasets of natural images and medical images. The results show that DAG achieves consistent and considerable improvements across these models. Furthermore, when DAG is used in some GAN models, the system establishes state-of-the-art Frechet Inception Distance (FID) scores. Our code is available.
While cloud/sky image segmentation has extensive real-world applications, a large amount of labelled data is needed to train a highly accurate models to perform the task. Scarcity of such volumes of cloud/sky images with corresponding ground-truth binary maps makes it highly difficult to train such complex image segmentation models. In this paper, we demonstrate the effectiveness of using Generative Adversarial Networks (GANs) to generate data to augment the training set in order to increase the prediction accuracy of image segmentation model. We further present a way to estimate ground-truth binary maps for the GAN-generated images to facilitate their effective use as augmented images. Finally, we validate our work with different statistical techniques.
Generative adversarial networks (GANs) have enjoyed tremendous success in image generation and processing, and have recently attracted growing interests in financial modelings. This paper analyzes GANs from the perspectives of mean-field games (MFGs) and optimal transport. More specifically, from the game theoretical perspective, GANs are interpreted as MFGs under Pareto Optimality criterion or mean-field controls; from the optimal transport perspective, GANs are to minimize the optimal transport cost indexed by the generator from the known latent distribution to the unknown true distribution of data. The MFGs perspective of GANs leads to a GAN-based computational method (MFGANs) to solve MFGs: one neural network for the backward Hamilton-Jacobi-Bellman equation and one neural network for the forward Fokker-Planck equation, with the two neural networks trained in an adversarial way. Numerical experiments demonstrate superior performance of this proposed algorithm, especially in the higher dimensional case, when compared with existing neural network approaches.
Although current deep generative adversarial networks (GANs) could synthesize high-quality (HQ) images, discovering novel GAN encoders for image reconstruction is still favorable. When embedding images to latent space, existing GAN encoders work well for aligned images (such as the human face), but they do not adapt to more generalized GANs. To our knowledge, current state-of-the-art GAN encoders do not have a proper encoder to reconstruct high-fidelity images from most misaligned HQ synthesized images on different GANs. Their performances are limited, especially on non-aligned and real images. We propose a novel method (named MTV-TSA) to handle such problems. Creating multi-type latent vectors (MTV) from latent space and two-scale attentions (TSA) from images allows designing a set of encoders that can be adaptable to a variety of pre-trained GANs. We generalize two sets of loss functions to optimize the encoders. The designed encoders could make GANs reconstruct higher fidelity images from most synthesized HQ images. In addition, the proposed method can reconstruct real images well and process them based on learned attribute directions. The designed encoders have unified convolutional blocks and could match well in current GAN architectures (such as PGGAN, StyleGANs, and BigGAN) by fine-tuning the corresponding normalization layers and the last block. Such well-designed encoders can also be trained to converge more quickly.
Generative adversarial networks (GANs) have attained photo-realistic quality. However, it remains an open challenge of how to best control the image content. We introduce LatentKeypointGAN, a two-stage GAN that is trained end-to-end on the classical GAN objective yet internally conditioned on a set of sparse keypoints with associated appearance embeddings that respectively control the position and style of the generated objects and their parts. A major difficulty that we address with suitable network architectures and training schemes is disentangling the image into spatial and appearance factors without any supervision signals of either nor domain knowledge. We demonstrate that LatentKeypointGAN provides an interpretable latent space that can be used to re-arrange the generated images by re-positioning and exchanging keypoint embeddings, such as combining the eyes, nose, and mouth from different images for generating portraits. In addition, the explicit generation of keypoints and matching images enables a new, GAN-based methodology for unsupervised keypoint detection.