Do you want to publish a course? Click here

A scheme for automatic differentiation of complex loss functions

74   0   0.0 ( 0 )
 Added by Chu Guo
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

For a real function, automatic differentiation is such a standard algorithm used to efficiently compute its gradient, that it is integrated in various neural network frameworks. However, despite the recent advances in using complex functions in machine learning and the well-established usefulness of automatic differentiation, the support of automatic differentiation for complex functions is not as well-established and widespread as for real functions. In this work we propose an efficient and seamless scheme to implement automatic differentiation for complex functions, which is a compatible generalization of the current scheme for real functions. This scheme can significantly simplify the implementation of neural networks which use complex numbers.



rate research

Read More

In this note, we report the back propagation formula for complex valued singular value decompositions (SVD). This formula is an important ingredient for a complete automatic differentiation(AD) infrastructure in terms of complex numbers, and it is also the key to understand and utilize AD in tensor networks.
The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlying these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stochastic gradient descent. Why spend computation and memory on exact (minibatch) gradients only to use them for stochastic optimization? We develop a general framework and approach for randomized automatic differentiation (RAD), which can allow unbiased gradient estimates to be computed with reduced memory in return for variance. We examine limitations of the general approach, and argue that we must leverage problem specific structure to realize benefits. We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.
Automatic Differentiation Variational Inference (ADVI) is a useful tool for efficiently learning probabilistic models in machine learning. Generally approximate posteriors learned by ADVI are forced to be unimodal in order to facilitate use of the reparameterization trick. In this paper, we show how stratified sampling may be used to enable mixture distributions as the approximate posterior, and derive a new lower bound on the evidence analogous to the importance weighted autoencoder (IWAE). We show that this SIWAE is a tighter bound than both IWAE and the traditional ELBO, both of which are special instances of this bound. We verify empirically that the traditional ELBO objective disfavors the presence of multimodal posterior distributions and may therefore not be able to fully capture structure in the latent space. Our experiments show that using the SIWAE objective allows the encoder to learn more complex distributions which regularly contain multimodality, resulting in higher accuracy and better calibration in the presence of incomplete, limited, or corrupted data.
97 - Alberto Ramos 2020
We present ADerrors.jl, a software for linear error propagation and analysis of Monte Carlo data. Although the focus is in data analysis in Lattice QCD, where estimates of the observables have to be computed from Monte Carlo samples, the software also deals with variables with uncertainties, either correlated or uncorrelated. Thanks to automatic differentiation techniques linear error propagation is performed exactly, even in iterative algorithms (i.e. errors in parameters of non-linear fits). In this contribution we present an overview of the capabilities of the software, including access to uncertainties in fit parameters and dealing with correlated data. The software, written in julia, is available for download and use in https://gitlab.ift.uam-csic.es/alberto/aderrors.jl
Data augmentation methods have been shown to be a fundamental technique to improve generalization in tasks such as image, text and audio classification. Recently, automated augmentation methods have led to further improvements on image classification and object detection leading to state-of-the-art performances. Nevertheless, little work has been done on time-series data, an area that could greatly benefit from automated data augmentation given the usually limited size of the datasets. We present two sample-adaptive automatic weighting schemes for data augmentation: the first learns to weight the contribution of the augmented samples to the loss, and the second method selects a subset of transformations based on the ranking of the predicted training loss. We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive. On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا