New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Learning and Generalization in Overparameterized Normalizing Flows

131 0 0.0 ( 0 )

Download Cite

Added by Kulin Shah

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Kulin Shah - Amit Deshpande - Navin Goyal

Machine Learning Artificial Intelligence

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) constitute an important class of models in unsupervised learning for sampling and density estimation. In this paper, we theoretically and empirically analyze these models when the underlying neural network is one-hidden-layer overparameterized network. Our main contributions are two-fold: (1) On the one hand, we provide theoretical and empirical evidence that for a class of NFs containing most of the existing NF models, overparametrization hurts training. (2) On the other hand, we prove that unconstrained NFs, a recently introduced model, can efficiently learn any reasonable data distribution under minimal assumptions when the underlying network is overparametrized.

rate research

Copula-Based Normalizing Flows

86 - Mike Laszkiewicz , Johannes Lederer , Asja Fischer 2021

Normalizing flows, which learn a distribution by transforming the data to samples from a Gaussian base distribution, have proven powerful density approximations. But their expressive power is limited by this choice of the base distribution. We, therefore, propose to generalize the base distribution to a more elaborate copula distribution to capture the properties of the target distribution more accurately. In a first empirical analysis, we demonstrate that this replacement can dramatically improve the vanilla normalizing flows in terms of flexibility, stability, and effectivity for heavy-tailed data. Our results suggest that the improvements are related to an increased local Lipschitz-stability of the learned flow.

Machine Learning Artificial Intelligence Machine Learning

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

253 - Zeyuan Allen-Zhu , Yuanzhi Li , Yingyu Liang 2018

The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesnt the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network (that can be viewed as a second-order variant of NTK), and connect it to the SGD theory of escaping saddle points.

Machine Learning Data Structures and Algorithms Neural and Evolutionary Computing

Jacobian Determinant of Normalizing Flows

126 - Huadong Liao , Jiawei He 2021

Normalizing flows learn a diffeomorphic mapping between the target and base distribution, while the Jacobian determinant of that mapping forms another real-valued function. In this paper, we show that the Jacobian determinant mapping is unique for the given distributions, hence the likelihood objective of flows has a unique global optimum. In particular, the likelihood for a class of flows is explicitly expressed by the eigenvalues of the auto-correlation matrix of individual data point, and independent of the parameterization of neural network, which provides a theoretical optimal value of likelihood objective and relates to probabilistic PCA. Additionally, Jacobian determinant is a measure of local volume change and is maximized when MLE is used for optimization. To stabilize normalizing flows training, it is required to maintain a balance between the expansiveness and contraction of volume, meaning Lipschitz constraint on the diffeomorphic mapping and its inverse. With these theoretical results, several principles of designing normalizing flow were proposed. And numerical experiments on highdimensional datasets (such as CelebA-HQ 1024x1024) were conducted to show the improved stability of training.

Machine Learning Artificial Intelligence Machine Learning

InFlow: Robust outlier detection utilizing Normalizing Flows

80 - Nishant Kumar , Pia Hanfeld , Michael Hecht 2021

Normalizing flows are prominent deep generative models that provide tractable probability distributions and efficient density estimation. However, they are well known to fail while detecting Out-of-Distribution (OOD) inputs as they directly encode the local features of the input representations in their latent space. In this paper, we solve this overconfidence issue of normalizing flows by demonstrating that flows, if extended by an attention mechanism, can reliably detect outliers including adversarial attacks. Our approach does not require outlier data for training and we showcase the efficiency of our method for OOD detection by reporting state-of-the-art performance in diverse experimental settings. Code available at https://github.com/ComputationalRadiationPhysics/InFlow .

Machine Learning Artificial Intelligence Cryptography and Security

Self Normalizing Flows

94 - T. Anderson Keller , Jorn W.T. Peters , Priyank Jaini 2020

Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layers exact update from $mathcal{O}(D^3)$ to $mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while training more quickly and surpassing the performance of functionally constrained counterparts.

Machine Learning Neural and Evolutionary Computing Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Learning and Generalization in Overparameterized Normalizing Flows

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions