On the capacity of deep generative networks for approximating distributions

163 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yunfei Yang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yunfei Yang - Zhen Li - Yang Wang

التعلم الآلي الاحتمالات نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the efficacy and efficiency of deep generative networks for approximating probability distributions. We prove that neural networks can transform a low-dimensional source distribution to a distribution that is arbitrarily close to a high-dimensional target distribution, when the closeness are measured by Wasserstein distances and maximum mean discrepancy. Upper bounds of the approximation error are obtained in terms of the width and depth of neural network. Furthermore, it is shown that the approximation error in Wasserstein distance grows at most linearly on the ambient dimension and that the approximation order only depends on the intrinsic dimension of the target distribution. On the contrary, when $f$-divergences are used as metrics of distributions, the approximation property is different. We show that in order to approximate the target distribution in $f$-divergences, the dimension of the source distribution cannot be smaller than the intrinsic dimension of the target distribution.

قيم البحث

375 - Stefano Favaro , Sandra Fortini , Stefano Peluchetti 2021

In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotics for deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed weights, and classes of Gaussian stochastic processes (SPs). Such an interplay has proved to be critical in several contexts of practical interest, e.g. Bayesian inference under Gaussian SP priors, kernel regression for infinite-wide deep NNs trained via gradient descent, and information propagation within infinite-wide NNs. Motivated by empirical analysis, showing the potential of replacing Gaussian distributions with Stable distributions for the NNs weights, in this paper we investigate large-width asymptotics for (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stable-distributed weights. First, we show that as the width goes to infinity jointly over the NNs layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized recursively through the NNs layers. Because of the non-triangular NNs structure, this is a non-standard asymptotic problem, to which we propose a novel and self-contained inductive approach, which may be of independent interest. Then, we establish sup-norm convergence rates of a deep Stable NN to a Stable SP, quantifying the critical difference between the settings of ``joint growth and ``sequential growth of the width over the NNs layers. Our work extends recent results on infinite-wide limits for deep Gaussian NNs to the more general deep Stable NNs, providing the first result on convergence rates for infinite-wide deep NNs.

التعلم الآلي الاحتمالات نظرية الإحصاء

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

93 - Shashank Rajput , Kartik Sreenivasan , Dimitris Papailiopoulos 2021

It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that emph{deep threshold} networ ks can memorize $n$ points in $d$ dimensions using $widetilde{mathcal{O}}(e^{1/delta^2}+sqrt{n})$ neurons and $widetilde{mathcal{O}}(e^{1/delta^2}(d+sqrt{n})+n)$ weights, where $delta$ is the minimum distance between the points. In this work, we improve the dependence on $delta$ from exponential to almost linear, proving that $widetilde{mathcal{O}}(frac{1}{delta}+sqrt{n})$ neurons and $widetilde{mathcal{O}}(frac{d}{delta}+n)$ weights are sufficient. Our construction uses Gaussian random weights only in the first layer, while all the subsequent layers use binary or integer weights. We also prove new lower bounds by connecting memorization in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.

التعلم الآلي نظرية المعلومات نظرية المعلومات

An error analysis of generative adversarial networks for learning distributions

74 - Jian Huang , Yuling Jiao , Zhen Li 2021

This paper studies how well generative adversarial networks (GANs) learn probability distributions from finite samples. Our main results establish the convergence rates of GANs under a collection of integral probability metrics defined through Holder classes, including the Wasserstein distance as a special case. We also show that GANs are able to adaptively learn data distributions with low-dimensional structures or have Holder densities, when the network architectures are chosen properly. In particular, for distributions concentrated around a low-dimensional set, we show that the learning rates of GANs do not depend on the high ambient dimension, but on the lower intrinsic dimension. Our analysis is based on a new oracle inequality decomposing the estimation error into the generator and discriminator approximation error and the statistical error, which may be of independent interest.

التعلم الآلي نظرية الإحصاء التعلم الالي

Analysis of Memory Capacity for Deep Echo State Networks

100 - Xuanlin Liu , Mingzhe Chen , Changchuan Yin 2019

In this paper, the echo state network (ESN) memory capacity, which represents the amount of input data an ESN can store, is analyzed for a new type of deep ESNs. In particular, two deep ESN architectures are studied. First, a parallel deep ESN is pro posed in which multiple reservoirs are connected in parallel allowing them to average outputs of multiple ESNs, thus decreasing the prediction error. Then, a series architecture ESN is proposed in which ESN reservoirs are placed in cascade that the output of each ESN is the input of the next ESN in the series. This series ESN architecture can capture more features between the input sequence and the output sequence thus improving the overall prediction accuracy. Fundamental analysis shows that the memory capacity of parallel ESNs is equivalent to that of a traditional shallow ESN, while the memory capacity of series ESNs is smaller than that of a traditional shallow ESN.In terms of normalized root mean square error, simulation results show that the parallel deep ESN achieves 38.5% reduction compared to the traditional shallow ESN while the series deep ESN achieves 16.8% reduction.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Information Compensation for Deep Conditional Generative Networks

106 - Zehao Wang , Kaili Wang , Tinne Tuytelaars 2020

In recent years, unsupervised/weakly-supervised conditional generative adversarial networks (GANs) have achieved many successes on the task of modeling and generating data. However, one of their weaknesses lies in their poor ability to separate, or d isentangle, the different factors that characterize the representation encoded in their latent space. To address this issue, we propose a novel structure for unsupervised conditional GANs powered by a novel Information Compensation Connection (IC-Connection). The proposed IC-Connection enables GANs to compensate for information loss incurred during deconvolution operations. In addition, to quantify the degree of disentanglement on both discrete and continuous latent variables, we design a novel evaluation procedure. Our empirical results suggest that our method achieves better disentanglement compared to the state-of-the-art GANs in a conditional generation setting.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي