Convergence and Sample Complexity of SGD in GANs

296 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Vasilis Kontonis

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Vasilis Kontonis - Sihan Liu - Christos Tzamos

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We provide theoretical convergence guarantees on training Generative Adversarial Networks (GANs) via SGD. We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function $phi(cdot)$ parametrized by a $d times d$ weight matrix $mathbf W_$, i.e., $f_(mathbf x) = phi(mathbf W_* mathbf x)$. Our main result is that by training the Generator together with a Discriminator according to the Stochastic Gradient Descent-Ascent iteration proposed by Goodfellow et al. yields a Generator distribution that approaches the target distribution of $f_*$. Specifically, we can learn the target distribution within total-variation distance $epsilon$ using $tilde O(d^2/epsilon^2)$ samples which is (near-)information theoretically optimal. Our results apply to a broad class of non-linear activation functions $phi$, including ReLUs and is enabled by a connection with truncated statistics and an appropriate design of the Discriminator network. Our approach relies on a bilevel optimization framework to show that vanilla SGDA works.

قيم البحث

96 - Zetong Qi , T.J. Wilder 2019

Adversarial attacks during the testing phase of neural networks pose a challenge for the deployment of neural networks in security critical settings. These attacks can be performed by adding noise that is imperceptible to humans on top of the origina l data. By doing so, an attacker can create an adversarial sample, which will cause neural networks to misclassify. In this paper, we seek to understand the theoretical limits of what can be learned by neural networks in the presence of an adversary. We first defined the hypothesis space of a neural network, and showed the relationship between the growth number of the entire neural network and the growth number of each neuron. Combine that with the adversarial Vapnik-Chervonenkis(VC)-dimension of halfspace classifiers, we concluded the adversarial VC-dimension of the neural networks with sign activation functions.

التعلم الآلي نظرية الإحصاء التعلم الالي

Convergence rate to the Tracy--Widom laws for the largest eigenvalue of sample covariance matrices

114 - Kevin Schnelli , Yuanyuan Xu 2021

We establish a quantitative version of the Tracy--Widom law for the largest eigenvalue of high dimensional sample covariance matrices. To be precise, we show that the fluctuations of the largest eigenvalue of a sample covariance matrix $X^*X$ converg e to its Tracy--Widom limit at a rate nearly $N^{-1/3}$, where $X$ is an $M times N$ random matrix whose entries are independent real or complex random variables, assuming that both $M$ and $N$ tend to infinity at a constant rate. This result improves the previous estimate $N^{-2/9}$ obtained by Wang [73]. Our proof relies on a Green function comparison method [27] using iterative cumulant expansions, the local laws for the Green function and asymptotic properties of the correlation kernel of the white Wishart ensemble.

الاحتمالات نظرية الإحصاء نظرية الإحصاء

Rates of convergence for density estimation with GANs

86 - Denis Belomestny , Eric Moulines , Alexey Naumov 2021

We undertake a precise study of the non-asymptotic properties of vanilla generative adversarial networks (GANs) and derive theoretical guarantees in the problem of estimating an unknown $d$-dimensional density $p^*$ under a proper choice of the class of generators and discriminators. We prove that the resulting density estimate converges to $p^*$ in terms of Jensen-Shannon (JS) divergence at the rate $(log n/n)^{2beta/(2beta+d)}$ where $n$ is the sample size and $beta$ determines the smoothness of $p^*.$ This is the first result in the literature on density estimation using vanilla GANs with JS rates faster than $n^{-1/2}$ in the regime $beta>d/2.$

نظرية الإحصاء التعلم الالي نظرية الإحصاء

Approximation and convergence of GANs training: an SDE approach

94 - Haoyang Cao , Xin Guo 2020

Generative adversarial networks (GANs) have enjoyed tremendous empirical successes, and research interest in the theoretical understanding of GANs training process is rapidly growing, especially for its evolution and convergence analysis. This paper establishes approximations, with precise error bound analysis, for the training of GANs under stochastic gradient algorithms (SGAs). The approximations are in the form of coupled stochastic differential equations (SDEs). The analysis of the SDEs and the associated invariant measures yields conditions for the convergence of GANs training. Further analysis of the invariant measure for the coupled SDEs gives rise to a fluctuation-dissipation relations (FDRs) for GANs, revealing the trade-off of the loss landscape between the generator and the discriminator and providing guidance for learning rate scheduling.

التعلم الآلي الاحتمالات التعلم الالي

A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate

93 - Naeimeh Omidvar , Mohammad Ali Maddah-Ali , Hamed Mahdavi 2020

In this paper, we propose a method of distributed stochastic gradient descent (SGD), with low communication load and computational complexity, and still fast convergence. To reduce the communication load, at each iteration of the algorithm, the worke r nodes calculate and communicate some scalers, that are the directional derivatives of the sample functions in some emph{pre-shared directions}. However, to maintain accuracy, after every specific number of iterations, they communicate the vectors of stochastic gradients. To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order stochastic gradient estimation, by performing just two function evaluations rather than computing a first-order gradient vector. The proposed method highly improves the convergence rate of the zeroth-order methods, guaranteeing order-wise faster convergence. Moreover, compared to the famous communication-efficient methods of model averaging (that perform local model updates and periodic communication of the gradients to synchronize the local models), we prove that for the general class of non-convex stochastic problems and with reasonable choice of parameters, the proposed method guarantees the same orders of communication load and convergence rate, while having order-wise less computational complexity. Experimental results on various learning problems in neural networks applications demonstrate the effectiveness of the proposed approach compared to various state-of-the-art distributed SGD methods.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات