Towards Generalized Implementation of Wasserstein Distance in GANs

91 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Minkai Xu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Minkai Xu - Zhiming Zhou - Guansong Lu

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Wasserstein GANs (WGANs), built upon the Kantorovich-Rubinstein (KR) duality of Wasserstein distance, is one of the most theoretically sound GAN models. However, in practice it does not always outperform other variants of GANs. This is mostly due to the imperfect implementation of the Lipschitz condition required by the KR duality. Extensive work has been done in the community with different implementations of the Lipschitz constraint, which, however, is still hard to satisfy the restriction perfectly in practice. In this paper, we argue that the strong Lipschitz constraint might be unnecessary for optimization. Instead, we take a step back and try to relax the Lipschitz constraint. Theoretically, we first demonstrate a more general dual form of the Wasserstein distance called the Sobolev duality, which relaxes the Lipschitz constraint but still maintains the favorable gradient property of the Wasserstein distance. Moreover, we show that the KR duality is actually a special case of the Sobolev duality. Based on the relaxed duality, we further propose a generalized WGAN training scheme named Sobolev Wasserstein GAN (SWGAN), and empirically demonstrate the improvement of SWGAN over existing methods with extensive experiments.

قيم البحث

79 - Zhiming Zhou , Jian Shen , Yuxuan Song 2019

Lipschitz continuity recently becomes popular in generative adversarial networks (GANs). It was observed that the Lipschitz regularized discriminator leads to improved training stability and sample quality. The mainstream implementations of Lipschitz continuity include gradient penalty and spectral normalization. In this paper, we demonstrate that gradient penalty introduces undesired bias, while spectral normalization might be over restrictive. We accordingly propose a new method which is efficient and unbiased. Our experiments verify our analysis and show that the proposed method is able to achieve successful training in various situations where gradient penalty and spectral normalization fail.

التعلم الآلي التعلم الالي

Improved Training of Wasserstein GANs

359 - Ishaan Gulrajani , Faruk Ahmed , Martin Arjovsky 2017

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

التعلم الآلي التعلم الالي

Some Theoretical Insights into Wasserstein GANs

481 - Gerard Biau 2020

Generative Adversarial Networks (GANs) have been successful in producing outstanding results in areas as diverse as image, video, and text generation. Building on these successes, a large number of empirical studies have validated the benefits of the cousin approach called Wasserstein GANs (WGANs), which brings stabilization in the training process. In the present paper, we add a new stone to the edifice by proposing some theoretical advances in the properties of WGANs. First, we properly define the architecture of WGANs in the context of integral probability metrics parameterized by neural networks and highlight some of their basic mathematical features. We stress in particular interesting optimization properties arising from the use of a parametric 1-Lipschitz discriminator. Then, in a statistically-driven approach, we study the convergence of empirical WGANs as the sample size tends to infinity, and clarify the adversarial effects of the generator and the discriminator by underlining some trade-off properties. These features are finally illustrated with experiments using both synthetic and real-world datasets.

التعلم الآلي التعلم الالي

Supervised Tree-Wasserstein Distance

80 - Yuki Takezawa , Ryoma Sato , Makoto Yamada 2021

To measure the similarity of documents, the Wasserstein distance is a powerful tool, but it requires a high computational cost. Recently, for fast computation of the Wasserstein distance, methods for approximating the Wasserstein distance using a tre e metric have been proposed. These tree-based methods allow fast comparisons of a large number of documents; however, they are unsupervised and do not learn task-specific distances. In this work, we propose the Supervised Tree-Wasserstein (STW) distance, a fast, supervised metric learning method based on the tree metric. Specifically, we rewrite the Wasserstein distance on the tree metric by the parent-child relationships of a tree and formulate it as a continuous optimization problem using a contrastive loss. Experimentally, we show that the STW distance can be computed fast, and improves the accuracy of document classification tasks. Furthermore, the STW distance is formulated by matrix multiplications, runs on a GPU, and is suitable for batch processing. Therefore, we show that the STW distance is extremely efficient when comparing a large number of documents.

التعلم الآلي التعلم الالي

Towards GANs Approximation Ability

82 - Xuejiao Liu , Yao Xu , Xueshuang Xiang 2020

Generative adversarial networks (GANs) have attracted intense interest in the field of generative models. However, few investigations focusing either on the theoretical analysis or on algorithm design for the approximation ability of the generator of GANs have been reported. This paper will first theoretically analyze GANs approximation property. Similar to the universal approximation property of the fully connected neural networks with one hidden layer, we prove that the generator with the input latent variable in GANs can universally approximate the potential data distribution given the increasing hidden neurons. Furthermore, we propose an approach named stochastic data generation (SDG) to enhance GANsapproximation ability. Our approach is based on the simple idea of imposing randomness through data generation in GANs by a prior distribution on the conditional probability between the layers. SDG approach can be easily implemented by using the reparameterization trick. The experimental results on synthetic dataset verify the improved approximation ability obtained by this SDG approach. In the practical dataset, four GANs using SDG can also outperform the corresponding traditional GANs when the model architectures are smaller.

التعلم الآلي التعلم الالي