No Arabic abstract
Most $L^p$-type universal approximation theorems guarantee that a given machine learning model class $mathscr{F}subseteq C(mathbb{R}^d,mathbb{R}^D)$ is dense in $L^p_{mu}(mathbb{R}^d,mathbb{R}^D)$ for any suitable finite Borel measure $mu$ on $mathbb{R}^d$. Unfortunately, this means that the models approximation quality can rapidly degenerate outside some compact subset of $mathbb{R}^d$, as any such measure is largely concentrated on some bounded subset of $mathbb{R}^d$. This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which upgrades $mathscr{F}$s approximation property in the following sense. The transformed model class, denoted by $mathscr{F}text{-tope}$, is shown to be dense in $L^p_{mu,text{strict}}(mathbb{R}^d,mathbb{R}^D)$ which is a topological space whose elements are locally $p$-integrable functions and whose topology is much finer than usual norm topology on $L^p_{mu}(mathbb{R}^d,mathbb{R}^D)$; here $mu$ is any suitable $sigma$-finite Borel measure $mu$ on $mathbb{R}^d$. Next, we show that if $mathscr{F}$ is any family of analytic functions then there is always a strict gap between $mathscr{F}text{-tope}$s expressibility and that of $mathscr{F}$, since we find that $mathscr{F}$ can never dense in $L^p_{mu,text{strict}}(mathbb{R}^d,mathbb{R}^D)$. In the general case, where $mathscr{F}$ may contain non-analytic functions, we provide an abstract form of these results guaranteeing that there always exists some function space in which $mathscr{F}text{-tope}$ is dense but $mathscr{F}$ is not, while, the converse is never possible. Applications to feedforward networks, convolutional neural networks, and polynomial bases are explored.
Modifications to a neural networks input and output layers are often required to accommodate the specificities of most practical learning tasks. However, the impact of such changes on architectures approximation capabilities is largely not understood. We present general conditions describing feature and readout maps that preserve an architectures ability to approximate any continuous functions uniformly on compacts. As an application, we show that if an architecture is capable of universal approximation, then modifying its final layer to produce binary values creates a new architecture capable of deterministically approximating any classifier. In particular, we obtain guarantees for deep CNNs and deep feed-forward networks. Our results also have consequences within the scope of geometric deep learning. Specifically, when the input and output spaces are Cartan-Hadamard manifolds, we obtain geometrically meaningful feature and readout maps satisfying our criteria. Consequently, commonly used non-Euclidean regression models between spaces of symmetric positive definite matrices are extended to universal DNNs. The same result allows us to show that the hyperbolic feed-forward networks, used for hierarchical learning, are universal. Our result is also used to show that the common practice of randomizing all but the last two layers of a DNN produces a universal family of functions with probability one. We also provide conditions on a DNNs first (resp. last) few layers connections and activation function which guarantee that these layers can have a width equal to the input (resp. output) spaces dimension while not negatively affecting the architectures approximation capabilities.
A Banach space X has the SHAI (surjective homomorphisms are injective) property provided that for every Banach space Y, every continuous surjective algebra homomorphism from the bounded linear operators on X onto the bounded linear operators on Y is injective. The main result gives a sufficient condition for X to have the SHAI property. The condition is satisfied for L^p (0, 1) for 1 < p < infty, spaces with symmetric bases that have finite cotype, and the Schatten p-spaces for 1 < p < infty.
We propose novel first-order stochastic approximation algorithms for canonical correlation analysis (CCA). Algorithms presented are instances of inexact matrix stochastic gradient (MSG) and inexact matrix exponentiated gradient (MEG), and achieve $epsilon$-suboptimality in the population objective in $operatorname{poly}(frac{1}{epsilon})$ iterations. We also consider practical variants of the proposed algorithms and compare them with other methods for CCA both theoretically and empirically.
We provide relaxation for not lower semicontinuous supremal functionals of the type $W^{1,infty}(Omega;mathbb R^d) i u mapstosupess_{ x in Omega}f( abla u(x))$ in the vectorial case, where $Omegasubset mathbb R^N$ is a Lipschitz, bounded open set, and $f$ is level convex. The connection with indicator functionals is also enlightened, thus extending previous lower semicontinuity results in that framework. Finally we discuss the $L^p$-approximation of supremal functionals, with non-negative, coercive densities $f=f(x,xi)$, which are only $L^N otimes B_{d times N}$-measurable.
Log-concave distributions include some important distributions such as normal distribution, exponential distribution and so on. In this note, we show inequalities between two Lp-norms for log-concave distributions on the Euclidean space. These inequalities are the generalizations of the upper and lower bound of the differential entropy and are also interpreted as a kind of expansion of the inequality between two Lp-norms on the measurable set with finite measure.