Neural SDEs as Infinite-Dimensional GANs

81 0 0.0 ( 0 )

Download Cite

Added by Patrick Kidger

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Patrick Kidger - James Foster - Xuechen Li

Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Stochastic differential equations (SDEs) are a staple of mathematical modelling of temporal dynamics. However, a fundamental limitation has been that such models have typically been relatively inflexible, which recent work introducing Neural SDEs has sought to solve. Here, we show that the current classical approach to fitting SDEs may be approached as a special case of (Wasserstein) GANs, and in doing so the neural and classical regimes may be brought together. The input noise is Brownian motion, the output samples are time-evolving paths produced by a numerical solver, and by parameterising a discriminator as a Neural Controlled Differential Equation (CDE), we obtain Neural SDEs as (in modern machine learning parlance) continuous-time generative time series models. Unlike previous work on this problem, this is a direct extension of the classical approach without reference to either prespecified statistics or density functions. Arbitrary drift and diffusions are admissible, so as the Wasserstein loss has a unique global minima, in the infinite data limit any SDE may be learnt. Example code has been made available as part of the texttt{torchsde} repository.

rate research

Monte Carlo Simulation of SDEs using GANs

116 - Jorino van Rhijn , Cornelis W. Oosterlee , Lech A. Grzelak 2021

Generative adversarial networks (GANs) have shown promising results when applied on partial differential equations and financial time series generation. We investigate if GANs can also be used to approximate one-dimensional Ito stochastic differential equations (SDEs). We propose a scheme that approximates the path-wise conditional distribution of SDEs for large time steps. Standard GANs are only able to approximate processes in distribution, yielding a weak approximation to the SDE. A conditional GAN architecture is proposed that enables strong approximation. We inform the discriminator of this GAN with the map between the prior input to the generator and the corresponding output samples, i.e. we introduce a `supervised GAN. We compare the input-output map obtained with the standard GAN and supervised GAN and show experimentally that the standard GAN may fail to provide a path-wise approximation. The GAN is trained on a dataset obtained with exact simulation. The architecture was tested on geometric Brownian motion (GBM) and the Cox-Ingersoll-Ross (CIR) process. The supervised GAN outperformed the Euler and Milstein schemes in strong error on a discretisation with large time steps. It also outperformed the standard conditional GAN when approximating the conditional distribution. We also demonstrate how standard GANs may give rise to non-parsimonious input-output maps that are sensitive to perturbations, which motivates the need for constraints and regularisation on GAN generators.

Machine Learning Numerical Analysis Numerical Analysis

Infinite-dimensional Folded-in-time Deep Neural Networks

107 - Florian Stelzer 2021

The method recently introduced in arXiv:2011.10115 realizes a deep neural network with just a single nonlinear element and delayed feedback. It is applicable for the description of physically implemented neural networks. In this work, we present an infinite-dimensional generalization, which allows for a more rigorous mathematical analysis and a higher flexibility in choosing the weight functions. Precisely speaking, the weights are described by Lebesgue integrable functions instead of step functions. We also provide a functional back-propagation algorithm, which enables gradient descent training of the weights. In addition, with a slight modification, our concept realizes recurrent neural networks.

Machine Learning Neural and Evolutionary Computing Dynamical Systems

Efficient and Accurate Gradients for Neural SDEs

47 - Patrick Kidger , James Foster , Xuechen Li 2021

Neural SDEs combine many of the best qualities of both RNNs and SDEs: memory efficient training, high-capacity function approximation, and strong priors on model space. This makes them a natural choice for modelling many types of temporal dynamics. Training a Neural SDE (either as a VAE or as a GAN) requires backpropagating through an SDE solve. This may be done by solving a backwards-in-time SDE whose solution is the desired parameter gradients. However, this has previously suffered from severe speed and accuracy issues, due to high computational cost and numerical truncation errors. Here, we overcome these issues through several technical innovations. First, we introduce the textit{reversible Heun method}. This is a new SDE solver that is textit{algebraically reversible}: eliminating numerical gradient errors, and the first such solver of which we are aware. Moreover it requires half as many function evaluations as comparable solvers, giving up to a $1.98times$ speedup. Second, we introduce the textit{Brownian Interval}: a new, fast, memory efficient, and exact way of sampling textit{and reconstructing} Brownian motion. With this we obtain up to a $10.6times$ speed improvement over previous techniques, which in contrast are both approximate and relatively slow. Third, when specifically training Neural SDEs as GANs (Kidger et al. 2021), we demonstrate how SDE-GANs may be trained through careful weight clipping and choice of activation function. This reduces computational cost (giving up to a $1.87times$ speedup) and removes the numerical truncation errors associated with gradient penalty. Altogether, we outperform the state-of-the-art by substantial margins, with respect to training speed, and with respect to classification, prediction, and MMD test metrics. We have contributed implementations of all of our techniques to the torchsde library to help facilitate their adoption.

Machine Learning Artificial Intelligence Dynamical Systems

Ghosts in Neural Networks: Existence, Structure and Role of Infinite-Dimensional Null Space

63 - Sho Sonoda , Isao Ishikawa , Masahiro Ikeda 2021

Overparametrization has been remarkably successful for deep learning studies. This study investigates an overlooked but important aspect of overparametrized neural networks, that is, the null components in the parameters of neural networks, or the ghosts. Since deep learning is not explicitly regularized, typical deep learning solutions contain null components. In this paper, we present a structure theorem of the null space for a general class of neural networks. Specifically, we show that any null element can be uniquely written by the linear combination of ridgelet transforms. In general, it is quite difficult to fully characterize the null space of an arbitrarily given operator. Therefore, the structure theorem is a great advantage for understanding a complicated landscape of neural network parameters. As applications, we discuss the roles of ghosts on the generalization performance of deep learning.

Machine Learning Machine Learning

On Universal Approximation by Neural Networks with Uniform Guarantees on Approximation of Infinite Dimensional Maps

126 - William H. Guss , Ruslan Salakhutdinov 2019

The study of universal approximation of arbitrary functions $f: mathcal{X} to mathcal{Y}$ by neural networks has a rich and thorough history dating back to Kolmogorov (1957). In the case of learning finite dimensional maps, many authors have shown various forms of the universality of both fixed depth and fixed width neural networks. However, in many cases, these classical results fail to extend to the recent use of approximations of neural networks with infinitely many units for functional data analysis, dynamical systems identification, and other applications where either $mathcal{X}$ or $mathcal{Y}$ become infinite dimensional. Two questions naturally arise: which infinite dimensional analogues of neural networks are sufficient to approximate any map $f: mathcal{X} to mathcal{Y}$, and when do the finite approximations to these analogues used in practice approximate $f$ uniformly over its infinite dimensional domain $mathcal{X}$? In this paper, we answer the open question of universal approximation of nonlinear operators when $mathcal{X}$ and $mathcal{Y}$ are both infinite dimensional. We show that for a large class of different infinite analogues of neural networks, any continuous map can be approximated arbitrarily closely with some mild topological conditions on $mathcal{X}$. Additionally, we provide the first lower-bound on the minimal number of input and output units required by a finite approximation to an infinite neural network to guarantee that it can uniformly approximate any nonlinear operator using samples from its inputs and outputs.

Machine Learning Functional Analysis Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Neural SDEs as Infinite-Dimensional GANs

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions