Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

85 0 0.0 ( 0 )

Download Cite

Added by Etai Littwin

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Etai Littwin - Omid Saremi - Shuangfei Zhai

Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck. Unlike the neural tangent kernel limit, a bottleneck in an otherwise infinite width network al-lows data dependent feature learning in its bottle-neck representation. We empirically show that a single bottleneck in infinite networks dramatically accelerates training when compared to purely in-finite networks, with an improved overall performance. We discuss the acceleration phenomena by drawing similarities to infinitely wide deep linear models, where the acceleration effect of a bottleneck can be understood theoretically.

rate research

Information in Infinite Ensembles of Infinitely-Wide Neural Networks

79 - Ravid Shwartz-Ziv , Alexander A. Alemi 2019

In this preliminary work, we study the generalization properties of infinite ensembles of infinitely-wide neural networks. Amazingly, this model family admits tractable calculations for many information-theoretic quantities. We report analytical and empirical investigations in the search for signals that correlate with generalization.

Machine Learning Information Theory Information Theory

Convexifying Sparse Interpolation with Infinitely Wide Neural Networks: An Atomic Norm Approach

70 - Akshay Kumar , Jarvis Haupt 2020

This work examines the problem of exact data interpolation via sparse (neuron count), infinitely wide, single hidden layer neural networks with leaky rectified linear unit activations. Using the atomic norm framework of [Chandrasekaran et al., 2012], we derive simple characterizations of the convex hulls of the corresponding atomic sets for this problem under several different constraints on the weights and biases of the network, thus obtaining equivalent convex formulations for these problems. A modest extension of our proposed framework to a binary classification problem is also presented. We explore the efficacy of the resulting formulations experimentally, and compare with networks trained via gradient descent.

Machine Learning Signal Processing Machine Learning

Neural Splines: Fitting 3D Surfaces with Infinitely-Wide Neural Networks

537 - Francis Williams , Matthew Trager , Joan Bruna 2020

We present Neural Splines, a technique for 3D surface reconstruction that is based on random feature kernels arising from infinitely-wide shallow ReLU networks. Our method achieves state-of-the-art results, outperforming recent neural network-based techniques and widely used Poisson Surface Reconstruction (which, as we demonstrate, can also be viewed as a type of kernel method). Because our approach is based on a simple kernel formulation, it is easy to analyze and can be accelerated by general techniques designed for kernel-based learning. We provide explicit analytical expressions for our kernel and argue that our formulation can be seen as a generalization of cubic spline interpolation to higher dimensions. In particular, the RKHS norm associated with Neural Splines biases toward smooth interpolants.

Computer Vision and Pattern Recognition Graphics

Infinitely Wide Graph Convolutional Networks: Semi-supervised Learning via Gaussian Processes

324 - Jilin Hu , Jianbing Shen , Bin Yang 2020

Graph convolutional neural networks~(GCNs) have recently demonstrated promising results on graph-based semi-supervised classification, but little work has been done to explore their theoretical properties. Recently, several deep neural networks, e.g., fully connected and convolutional neural networks, with infinite hidden units have been proved to be equivalent to Gaussian processes~(GPs). To exploit both the powerful representational capacity of GCNs and the great expressive power of GPs, we investigate similar properties of infinitely wide GCNs. More specifically, we propose a GP regression model via GCNs~(GPGC) for graph-based semi-supervised learning. In the process, we formulate the kernel matrix computation of GPGC in an iterative analytical form. Finally, we derive a conditional distribution for the labels of unobserved nodes based on the graph structure, labels for the observed nodes, and the feature matrix of all the nodes. We conduct extensive experiments to evaluate the semi-supervised classification performance of GPGC and demonstrate that it outperforms other state-of-the-art methods by a clear margin on all the datasets while being efficient.

Machine Learning Computer Vision and Pattern Recognition Machine Learning

Feature Learning in Infinite-Width Neural Networks

171 - Greg Yang , Edward J. Hu 2020

As its width tends to infinity, a deep neural networks behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT. We propose simple modifications to the standard parametrization to allow for feature learning in the limit. Using the *Tensor Programs* technique, we derive explicit formulas for such limits. On Word2Vec and few-shot learning on Omniglot via MAML, two canonical tasks that rely crucially on feature learning, we compute these limits exactly. We find that they outperform both NTK baselines and finite-width networks, with the latter approaching the infinite-width feature learning performance as width increases. More generally, we classify a natural space of neural network parametrizations that generalizes standard, NTK, and Mean Field parametrizations. We show 1) any parametrization in this space either admits feature learning or has an infinite-width training dynamics given by kernel gradient descent, but not both; 2) any such infinite-width limit can be computed using the Tensor Programs technique. Code for our experiments can be found at github.com/edwardjhu/TP4.

Machine Learning Disordered Systems and Neural Networks Neural and Evolutionary Computing

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions