A scheme for automatic differentiation of complex loss functions

74 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chu Guo

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Chu Guo - Dario Poletti

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

For a real function, automatic differentiation is such a standard algorithm used to efficiently compute its gradient, that it is integrated in various neural network frameworks. However, despite the recent advances in using complex functions in machine learning and the well-established usefulness of automatic differentiation, the support of automatic differentiation for complex functions is not as well-established and widespread as for real functions. In this work we propose an efficient and seamless scheme to implement automatic differentiation for complex functions, which is a compatible generalization of the current scheme for real functions. This scheme can significantly simplify the implementation of neural networks which use complex numbers.

قيم البحث

154 - Zhou-Quan Wan , Shi-Xin Zhang 2019

In this note, we report the back propagation formula for complex valued singular value decompositions (SVD). This formula is an important ingredient for a complete automatic differentiation(AD) infrastructure in terms of complex numbers, and it is al so the key to understand and utilize AD in tensor networks.

التحليل العددي الميكانيكا الإحصائية الإلكترونات المرتبطة بشدة

Randomized Automatic Differentiation

90 - Deniz Oktay , Nick McGreivy , Joshua Aduol 2020

The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlyi ng these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stochastic gradient descent. Why spend computation and memory on exact (minibatch) gradients only to use them for stochastic optimization? We develop a general framework and approach for randomized automatic differentiation (RAD), which can allow unbiased gradient estimates to be computed with reduced memory in return for variance. We examine limitations of the general approach, and argue that we must leverage problem specific structure to realize benefits. We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.

التعلم الآلي التعلم الالي

Automatic Differentiation Variational Inference with Mixtures

112 - Warren R. Morningstar , Sharad M. Vikram , Cusuh Ham 2020

Automatic Differentiation Variational Inference (ADVI) is a useful tool for efficiently learning probabilistic models in machine learning. Generally approximate posteriors learned by ADVI are forced to be unimodal in order to facilitate use of the re parameterization trick. In this paper, we show how stratified sampling may be used to enable mixture distributions as the approximate posterior, and derive a new lower bound on the evidence analogous to the importance weighted autoencoder (IWAE). We show that this SIWAE is a tighter bound than both IWAE and the traditional ELBO, both of which are special instances of this bound. We verify empirically that the traditional ELBO objective disfavors the presence of multimodal posterior distributions and may therefore not be able to fully capture structure in the latent space. Our experiments show that using the SIWAE objective allows the encoder to learn more complex distributions which regularly contain multimodality, resulting in higher accuracy and better calibration in the presence of incomplete, limited, or corrupted data.

التعلم الآلي التعلم الالي

Automatic differentiation for error analysis

97 - Alberto Ramos 2020

We present ADerrors.jl, a software for linear error propagation and analysis of Monte Carlo data. Although the focus is in data analysis in Lattice QCD, where estimates of the observables have to be computed from Monte Carlo samples, the software als o deals with variables with uncertainties, either correlated or uncorrelated. Thanks to automatic differentiation techniques linear error propagation is performed exactly, even in iterative algorithms (i.e. errors in parameters of non-linear fits). In this contribution we present an overview of the capabilities of the software, including access to uncertainties in fit parameters and dealing with correlated data. The software, written in julia, is available for download and use in https://gitlab.ift.uam-csic.es/alberto/aderrors.jl

فيزياء الطاقة العالية - شعرية

Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation

230 - Elizabeth Fons , Paula Dawson , Xiao-jun Zeng 2021

Data augmentation methods have been shown to be a fundamental technique to improve generalization in tasks such as image, text and audio classification. Recently, automated augmentation methods have led to further improvements on image classification and object detection leading to state-of-the-art performances. Nevertheless, little work has been done on time-series data, an area that could greatly benefit from automated data augmentation given the usually limited size of the datasets. We present two sample-adaptive automatic weighting schemes for data augmentation: the first learns to weight the contribution of the augmented samples to the loss, and the second method selects a subset of transformations based on the ranking of the predicted training loss. We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive. On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.

التعلم الآلي التعلم الالي