Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Low-memory stochastic backpropagation with multi-channel randomized trace estimation

233 0 0.0 ( 0 )

Download Cite

Added by Mathias Louboutin

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Mathias Louboutin - Ali Siahkoohi - Rongrong Wang

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Thanks to the combination of state-of-the-art accelerators and highly optimized open software frameworks, there has been tremendous progress in the performance of deep neural networks. While these developments have been responsible for many breakthroughs, progress towards solving large-scale problems, such as video encoding and semantic segmentation in 3D, is hampered because access to on-premise memory is often limited. Instead of relying on (optimal) checkpointing or invertibility of the network layers -- to recover the activations during backpropagation -- we propose to approximate the gradient of convolutional layers in neural networks with a multi-channel randomized trace estimation technique. Compared to other methods, this approach is simple, amenable to analyses, and leads to a greatly reduced memory footprint. Even though the randomized trace estimation introduces stochasticity during training, we argue that this is of little consequence as long as the induced errors are of the same order as errors in the gradient due to the use of stochastic gradient descent. We discuss the performance of networks trained with stochastic backpropagation and how the error can be controlled while maximizing memory usage and minimizing computational overhead.

rate research

Ultra-low memory seismic inversion with randomized trace estimation

118 - Mathias Louboutin , Felix J. Herrmann 2021

Inspired by recent work on extended image volumes that lays the ground for randomized probing of extremely large seismic wavefield matrices, we present a memory frugal and computationally efficient inversion methodology that uses techniques from randomized linear algebra. By means of a carefully selected realistic synthetic example, we demonstrate that we are capable of achieving competitive inversion results at a fraction of the memory cost of conventional full-waveform inversion with limited computational overhead. By exchanging memory for negligible computational overhead, we open with the presented technology the door towards the use of low-memory accelerators such as GPUs.

Geophysics Discrete Mathematics Computational Physics

Hutch++: Optimal Stochastic Trace Estimation

300 - Raphael A. Meyer , Cameron Musco , Christopher Musco 2020

We study the problem of estimating the trace of a matrix $A$ that can only be accessed through matrix-vector multiplication. We introduce a new randomized algorithm, Hutch++, which computes a $(1 pm epsilon)$ approximation to $tr(A)$ for any positive semidefinite (PSD) $A$ using just $O(1/epsilon)$ matrix-vector products. This improves on the ubiquitous Hutchinsons estimator, which requires $O(1/epsilon^2)$ matrix-vector products. Our approach is based on a simple technique for reducing the variance of Hutchinsons estimator using a low-rank approximation step, and is easy to implement and analyze. Moreover, we prove that, up to a logarithmic factor, the complexity of Hutch++ is optimal amongst all matrix-vector query algorithms, even when queries can be chosen adaptively. We show that it significantly outperforms Hutchinsons method in experiments. While our theory mainly requires $A$ to be positive semidefinite, we provide generalized guarantees for general square matrices, and show empirical gains in such applications.

Data Structures and Algorithms Machine Learning Numerical Analysis

Enhancing accuracy of deep learning algorithms by training with low-discrepancy sequences

70 - Siddhartha Mishra , T. Konstantin Rusch 2020

We propose a deep supervised learning algorithm based on low-discrepancy sequences as the training set. By a combination of theoretical arguments and extensive numerical experiments we demonstrate that the proposed algorithm significantly outperforms standard deep learning algorithms that are based on randomly chosen training data, for problems in moderately high dimensions. The proposed algorithm provides an efficient method for building inexpensive surrogates for many underlying maps in the context of scientific computing.

Machine Learning Numerical Analysis Numerical Analysis

From the Greene--Wu Convolution to Gradient Estimation over Riemannian Manifolds

195 - Tianyu Wang , Yifeng Huang , Didong Li 2021

Over a complete Riemannian manifold of finite dimension, Greene and Wu introduced a convolution, known as Greene-Wu (GW) convolution. In this paper, we study properties of the GW convolution and apply it to non-Euclidean machine learning problems. In particular, we derive a new formula for how the curvature of the space would affect the curvature of the function through the GW convolution. Also, following the study of the GW convolution, a new method for gradient estimation over Riemannian manifolds is introduced.

Machine Learning Numerical Analysis Numerical Analysis

A Multilevel Approach to Variance Reduction in the Stochastic Estimation of the Trace of a Matrix

99 - Andreas Frommer , Mostafa Nasr Khalil , Gustavo Ramirez-Hidalgo 2021

The trace of a matrix function f(A), most notably of the matrix inverse, can be estimated stochastically using samples< x,f(A)x> if the components of the random vectors x obey an appropriate probability distribution. However such a Monte-Carlo sampling suffers from the fact that the accuracy depends quadratically of the samples to use, thus making higher precision estimation very costly. In this paper we suggest and investigate a multilevel Monte-Carlo approach which uses a multigrid hierarchy to stochastically estimate the trace. This results in a substantial reduction of the variance, so that higher precision can be obtained at much less effort. We illustrate this for the trace of the inverse using three different classes of matrices.

Numerical Analysis Numerical Analysis High Energy Physics - Lattice

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Low-memory stochastic backpropagation with multi-channel randomized trace estimation

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions