Low-memory stochastic backpropagation with multi-channel randomized trace estimation

233 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mathias Louboutin

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Mathias Louboutin - Ali Siahkoohi - Rongrong Wang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Thanks to the combination of state-of-the-art accelerators and highly optimized open software frameworks, there has been tremendous progress in the performance of deep neural networks. While these developments have been responsible for many breakthroughs, progress towards solving large-scale problems, such as video encoding and semantic segmentation in 3D, is hampered because access to on-premise memory is often limited. Instead of relying on (optimal) checkpointing or invertibility of the network layers -- to recover the activations during backpropagation -- we propose to approximate the gradient of convolutional layers in neural networks with a multi-channel randomized trace estimation technique. Compared to other methods, this approach is simple, amenable to analyses, and leads to a greatly reduced memory footprint. Even though the randomized trace estimation introduces stochasticity during training, we argue that this is of little consequence as long as the induced errors are of the same order as errors in the gradient due to the use of stochastic gradient descent. We discuss the performance of networks trained with stochastic backpropagation and how the error can be controlled while maximizing memory usage and minimizing computational overhead.

قيم البحث

118 - Mathias Louboutin , Felix J. Herrmann 2021

Inspired by recent work on extended image volumes that lays the ground for randomized probing of extremely large seismic wavefield matrices, we present a memory frugal and computationally efficient inversion methodology that uses techniques from rand omized linear algebra. By means of a carefully selected realistic synthetic example, we demonstrate that we are capable of achieving competitive inversion results at a fraction of the memory cost of conventional full-waveform inversion with limited computational overhead. By exchanging memory for negligible computational overhead, we open with the presented technology the door towards the use of low-memory accelerators such as GPUs.

الجيوفيزياء الرياضيات المتقطعة الفيزياء الحسابية

Hutch++: Optimal Stochastic Trace Estimation

300 - Raphael A. Meyer , Cameron Musco , Christopher Musco 2020

We study the problem of estimating the trace of a matrix $A$ that can only be accessed through matrix-vector multiplication. We introduce a new randomized algorithm, Hutch++, which computes a $(1 pm epsilon)$ approximation to $tr(A)$ for any positive semidefinite (PSD) $A$ using just $O(1/epsilon)$ matrix-vector products. This improves on the ubiquitous Hutchinsons estimator, which requires $O(1/epsilon^2)$ matrix-vector products. Our approach is based on a simple technique for reducing the variance of Hutchinsons estimator using a low-rank approximation step, and is easy to implement and analyze. Moreover, we prove that, up to a logarithmic factor, the complexity of Hutch++ is optimal amongst all matrix-vector query algorithms, even when queries can be chosen adaptively. We show that it significantly outperforms Hutchinsons method in experiments. While our theory mainly requires $A$ to be positive semidefinite, we provide generalized guarantees for general square matrices, and show empirical gains in such applications.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التحليل العددي

Enhancing accuracy of deep learning algorithms by training with low-discrepancy sequences

70 - Siddhartha Mishra , T. Konstantin Rusch 2020

We propose a deep supervised learning algorithm based on low-discrepancy sequences as the training set. By a combination of theoretical arguments and extensive numerical experiments we demonstrate that the proposed algorithm significantly outperforms standard deep learning algorithms that are based on randomly chosen training data, for problems in moderately high dimensions. The proposed algorithm provides an efficient method for building inexpensive surrogates for many underlying maps in the context of scientific computing.

التعلم الآلي التحليل العددي التحليل العددي

From the Greene--Wu Convolution to Gradient Estimation over Riemannian Manifolds

195 - Tianyu Wang , Yifeng Huang , Didong Li 2021

Over a complete Riemannian manifold of finite dimension, Greene and Wu introduced a convolution, known as Greene-Wu (GW) convolution. In this paper, we study properties of the GW convolution and apply it to non-Euclidean machine learning problems. In particular, we derive a new formula for how the curvature of the space would affect the curvature of the function through the GW convolution. Also, following the study of the GW convolution, a new method for gradient estimation over Riemannian manifolds is introduced.

التعلم الآلي التحليل العددي التحليل العددي

A Multilevel Approach to Variance Reduction in the Stochastic Estimation of the Trace of a Matrix

99 - Andreas Frommer , Mostafa Nasr Khalil , Gustavo Ramirez-Hidalgo 2021

The trace of a matrix function f(A), most notably of the matrix inverse, can be estimated stochastically using samples< x,f(A)x> if the components of the random vectors x obey an appropriate probability distribution. However such a Monte-Carlo sampli ng suffers from the fact that the accuracy depends quadratically of the samples to use, thus making higher precision estimation very costly. In this paper we suggest and investigate a multilevel Monte-Carlo approach which uses a multigrid hierarchy to stochastically estimate the trace. This results in a substantial reduction of the variance, so that higher precision can be obtained at much less effort. We illustrate this for the trace of the inverse using three different classes of matrices.

التحليل العددي التحليل العددي فيزياء الطاقة العالية - شعرية