ﻻ يوجد ملخص باللغة العربية
In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) constitute an important class of models in unsupervised learning for sampling and density estimation. In this paper, we theoretically and empirically analyze these models when the underlying neural network is one-hidden-layer overparameterized network. Our main contributions are two-fold: (1) On the one hand, we provide theoretical and empirical evidence that for a class of NFs containing most of the existing NF models, overparametrization hurts training. (2) On the other hand, we prove that unconstrained NFs, a recently introduced model, can efficiently learn any reasonable data distribution under minimal assumptions when the underlying network is overparametrized.
Normalizing flows, which learn a distribution by transforming the data to samples from a Gaussian base distribution, have proven powerful density approximations. But their expressive power is limited by this choice of the base distribution. We, there
The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesnt the trained network overfit when it is overparameterized? In this work, we prove that overparamete
Normalizing flows learn a diffeomorphic mapping between the target and base distribution, while the Jacobian determinant of that mapping forms another real-valued function. In this paper, we show that the Jacobian determinant mapping is unique for th
Normalizing flows are prominent deep generative models that provide tractable probability distributions and efficient density estimation. However, they are well known to fail while detecting Out-of-Distribution (OOD) inputs as they directly encode th
Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with eas