An Ode to an ODE

66 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xingyou Song

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Krzysztof Choromanski - Jared Quincy Davis - Valerii Likhosherstov

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d). This nested system of two flows, where the parameter-flow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem which is intrinsically related to training deep neural network architectures such as Neural ODEs. Consequently, it leads to better downstream models, as we show on the example of training reinforcement learning policies with evolution strategies, and in the supervised learning setting, by comparing with previous SOTA baselines. We provide strong convergence results for our proposed mechanism that are independent of the depth of the network, supporting our empirical studies. Our results show an intriguing connection between the theory of deep neural networks and the field of matrix flows on compact manifolds.

قيم البحث

50 - Tangjun Wang , Zehao Dou , Chenglong Bao 2021

Interpreting deep neural networks from the ordinary differential equations (ODEs) perspective has inspired many efficient and robust network architectures. However, existing ODE based approaches ignore the relationship among data points, which is a c ritical component in many problems including few-shot learning and semi-supervised learning. In this paper, inspired by the diffusive ODEs, we propose a novel diffusion residual network (Diff-ResNet) to strengthen the interactions among data points. Under the structured data assumption, it is proved that the diffusion mechanism can decrease the distance-diameter ratio that improves the separability of inter-class points and reduces the distance among local intra-class points. This property can be easily adopted by the residual networks for constructing the separable hyperplanes. The synthetic binary classification experiments demonstrate the effectiveness of the proposed diffusion mechanism. Moreover, extensive experiments of few-shot image classification and semi-supervised graph node classification in various datasets validate the advantages of the proposed Diff-ResNet over existing few-shot learning methods.

التعلم الآلي

Hamiltonian Graph Networks with ODE Integrators

100 - Alvaro Sanchez-Gonzalez , Victor Bapst , Kyle Cranmer 2019

We introduce an approach for imposing physically informed inductive biases in learned simulation models. We combine graph networks with a differentiable ordinary differential equation integrator as a mechanism for predicting future states, and a Hami ltonian as an internal representation. We find that our approach outperforms baselines without these biases in terms of predictive accuracy, energy accuracy, and zero-shot generalization to time-step sizes and integrator orders not experienced during training. This advances the state-of-the-art of learned simulation, and in principle is applicable beyond physical domains.

التعلم الآلي الفيزياء الحسابية

On the convergence of an exotic formal series solution of an ODE

109 - Renat Gontsov , Irina Goryuchkina 2018

A sufficient condition of the convergence of an exotic formal series (a kind of power series with complex exponents) solution to an ODE of a general form is proposed.

التحليل الكلاسيكي و ODEs

Compressing Deep ODE-Nets using Basis Function Expansions

130 - Alejandro Queiruga , N. Benjamin Erichson , Liam Hodgkinson 2021

The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-depth functions u sing linear combinations of basis functions. This perspective allows us to compress the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance. In turn, both inference time and the memory footprint are reduced, enabling quick and rigorous adaptation between computational environments. Furthermore, our framework enables meaningful continuous-in-time batch normalization layers using function projections. The performance of basis function compression is demonstrated by applying continuous-depth models to (a) image classification tasks using convolutional units and (b) sentence-tagging tasks using transformer encoder units.

التعلم الآلي التعلم الالي

On the convergence of generalized power series satisfying an algebraic ODE

385 - Renat Gontsov , Irina Goryuchkina 2014

We propose a sufficient condition of the convergence of a generalized power series formally satisfying an algebraic (polynomial) ordinary differential equation. The proof is based on the majorant method.

التحليل الكلاسيكي و ODEs