ﻻ يوجد ملخص باللغة العربية
The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlying these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stochastic gradient descent. Why spend computation and memory on exact (minibatch) gradients only to use them for stochastic optimization? We develop a general framework and approach for randomized automatic differentiation (RAD), which can allow unbiased gradient estimates to be computed with reduced memory in return for variance. We examine limitations of the general approach, and argue that we must leverage problem specific structure to realize benefits. We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.
Automatic Differentiation Variational Inference (ADVI) is a useful tool for efficiently learning probabilistic models in machine learning. Generally approximate posteriors learned by ADVI are forced to be unimodal in order to facilitate use of the re
For a real function, automatic differentiation is such a standard algorithm used to efficiently compute its gradient, that it is integrated in various neural network frameworks. However, despite the recent advances in using complex functions in machi
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program. AD exploits the fact that every computer program, no matter how complicated, executes
In this paper we introduce DiffSharp, an automatic differentiation (AD) library designed with machine learning in mind. AD is a family of techniques that evaluate derivatives at machine precision with only a small constant factor of overhead, by syst
CANDECOMP/PARAFAC (CP) decomposition has been widely used to deal with multi-way data. For real-time or large-scale tensors, based on the ideas of randomized-sampling CP decomposition algorithm and online CP decomposition algorithm, a novel CP decomp