No Arabic abstract
While it is widely known that neural networks are universal approximators of continuous functions, a less known and perhaps more powerful result is that a neural network with a single hidden layer can approximate accurately any nonlinear continuous operator. This universal approximation theorem is suggestive of the potential application of neural networks in learning nonlinear operators from data. However, the theorem guarantees only a small approximation error for a sufficient large network, and does not consider the important optimization and generalization errors. To realize this theorem in practice, we propose deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset. A DeepONet consists of two sub-networks, one for encoding the input function at a fixed number of sensors $x_i, i=1,dots,m$ (branch net), and another for encoding the locations for the output functions (trunk net). We perform systematic simulations for identifying two types of operators, i.e., dynamic systems and partial differential equations, and demonstrate that DeepONet significantly reduces the generalization error compared to the fully-connected networks. We also derive theoretically the dependence of the approximation error in terms of the number of sensors (where the input function is defined) as well as the input function type, and we verify the theorem with computational results. More importantly, we observe high-order error convergence in our computational tests, namely polynomial rates (from half order to fourth order) and even exponential convergence with respect to the training dataset size.
For a constant coefficient partial differential operator $P(D)$ with a single characteristic direction such as the time-dependent free Schrodinger operator as well as non-degenerate parabolic differential operators like the heat operator we characterize when open subsets $X_1subseteq X_2$ of $mathbb{R}^d$ form a $P$-Runge pair. The presented condition does not require any kind of regularity of the boundaries of $X_1$ nor $X_2$. As part of our result we prove that for a large class of non-elliptic operators $P(D)$ there are smooth solutions $u$ to the equation $P(D)u=0$ on $mathbb{R}^d$ with support contained in an arbitarily narrow slab bounded by two parallel characteristic hyperplanes for $P(D)$.
State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and mobile devices poses a challenge due to their limited compute capabilities and strict energy budgets. One solution that has generated significant research interest is deploying highly quantized models that operate on low precision inputs and weights less than eight bits, trading off accuracy for performance. These models have a significantly reduced memory footprint (up to 32x reduction) and can replace multiply-accumulates with bitwise operations during compute intensive convolution and fully connected layers. Most deep learning frameworks rely on highly engineered linear algebra libraries such as ATLAS or Intels MKL to implement efficient deep learning operators. To date, none of the popular deep learning directly support low precision operators, partly due to a lack of optimized low precision libraries. In this paper we introduce a work flow to quickly generate high performance low precision deep learning operators for arbitrary precision that target multiple CPU architectures and include optimizations such as memory tiling and vectorization. We present an extensive case study on low power ARM Cortex-A53 CPU, and show how we can generate 1-bit, 2-bit convolutions with speedups up to 16x over an optimized 16-bit integer baseline and 2.3x better than handwritten implementations.
Modelling functions of sets, or equivalently, permutation-invariant functions, is a long-standing challenge in machine learning. Deep Sets is a popular method which is known to be a universal approximator for continuous set functions. We provide a theoretical analysis of Deep Sets which shows that this universal approximation property is only guaranteed if the models latent space is sufficiently high-dimensional. If the latent space is even one dimension lower than necessary, there exist piecewise-affine functions for which Deep Sets performs no better than a naive constant baseline, as judged by worst-case error. Deep Sets may be viewed as the most efficient incarnation of the Janossy pooling paradigm. We identify this paradigm as encompassing most currently popular set-learning methods. Based on this connection, we discuss the implications of our results for set learning more broadly, and identify some open questions on the universality of Janossy pooling in general.
The quaternionic spectral theorem has already been considered in the literature, see e.g. [22], [31], [32], however, except for the finite dimensional case in which the notion of spectrum is associated to an eigenvalue problem, see [21], it is not specified which notion of spectrum underlies the theorem. In this paper we prove the quaternionic spectral theorem for unitary operators using the $S$-spectrum. In the case of quaternionic matrices, the $S$-spectrum coincides with the right-spectrum and so our result recovers the well known theorem for matrices. The notion of $S$-spectrum is relatively new, see [17], and has been used for quaternionic linear operators, as well as for $n$-tuples of not necessarily commuting operators, to define and study a noncommutati
In this paper we develop the calculus of pseudo-differential operators corresponding to the quantizations of the form $$ Au(x)=int_{mathbb{R}^n}int_{mathbb{R}^n}e^{i(x-y)cdotxi}sigma(x+tau(y-x),xi)u(y)dydxi, $$ where $tau:mathbb{R}^ntomathbb{R}^n$ is a general function. In particular, for the linear choices $tau(x)=0$, $tau(x)=x$, and $tau(x)=frac{x}{2}$ this covers the well-known Kohn-Nirenberg, anti-Kohn-Nirenberg, and Weyl quantizations, respectively. Quantizations of such type appear naturally in the analysis on nilpotent Lie groups for polynomial functions $tau$ and here we investigate the corresponding calculus in the model case of $mathbb{R}^n$. We also give examples of nonlinear $tau$ appearing on the polarised and non-polarised Heisenberg groups, inspired by the recent joint work with Marius Mantoiu.