ترغب بنشر مسار تعليمي؟ اضغط هنا

69 - Anastasis Kratsios 2021
We introduce a general framework for approximating regular conditional distributions (RCDs). Our approximations of these RCDs are implemented by a new class of geometric deep learning models with inputs in $mathbb{R}^d$ and outputs in the Wasserstein -$1$ space $mathcal{P}_1(mathbb{R}^D)$. We find that the models built using our framework can approximate any continuous functions from $mathbb{R}^d$ to $mathcal{P}_1(mathbb{R}^D)$ uniformly on compacts, and quantitative rates are obtained. We identify two methods for avoiding the curse of dimensionality; i.e.: the number of parameters determining the approximating neural network depends only polynomially on the involved dimension and the approximation error. The first solution describes functions in $C(mathbb{R}^d,mathcal{P}_1(mathbb{R}^D))$ which can be efficiently approximated on any compact subset of $mathbb{R}^d$. Conversely, the second approach describes sets in $mathbb{R}^d$, on which any function in $C(mathbb{R}^d,mathcal{P}_1(mathbb{R}^D))$ can be efficiently approximated. Our framework is used to obtain an affirmative answer to the open conjecture of Bishop (1994); namely: mixture density networks are universal regular conditional distributions. The predictive performance of the proposed models is evaluated against comparable learning models on various probabilistic predictions tasks in the context of ELMs, model uncertainty, and heteroscedastic regression. All the results are obtained for more general input and output spaces and thus apply to geometric deep learning contexts.
This paper addresses the growing need to process non-Euclidean data, by introducing a geometric deep learning (GDL) framework for building universal feedforward-type models compatible with differentiable manifold geometries. We show that our GDL mode ls can approximate any continuous target function uniformly on compacts of a controlled maximum diameter. We obtain curvature dependant lower-bounds on this maximum diameter and upper-bounds on the depth of our approximating GDL models. Conversely, we find that there is always a continuous function between any two non-degenerate compact manifolds that any locally-defined GDL model cannot uniformly approximate. Our last main result identifies data-dependent conditions guaranteeing that the GDL model implementing our approximation breaks the curse of dimensionality. We find that any real-world (i.e. finite) dataset always satisfies our condition and, conversely, any dataset satisfies our requirement if the target function is smooth. As applications, we confirm the universal approximation capabilities of the following GDL models: Ganea et al. (2018)s hyperbolic feedforward networks, the architecture implementing Krishnan et al. (2015)s deep Kalman-Filter, and deep softmax classifiers. We build universal extensions/variants of: the SPD-matrix regressor of Meyer et al. (2011), and Fletcher et al. (2009)s Procrustean regressor. In the Euclidean setting, our results imply a quantitative version of Kidger and Lyons (2020)s approximation theorem and a data-dependent version of Yarotsky and Zhevnerchuk (2020)s uncursed approximation rates.
The need for fast and robust optimization algorithms are of critical importance in all areas of machine learning. This paper treats the task of designing optimization algorithms as an optimal control problem. Using regret as a metric for an algorithm s performance, we study the existence, uniqueness and consistency of regret-optimal algorithms. By providing first-order optimality conditions for the control problem, we show that regret-optimal algorithms must satisfy a specific structure in their dynamics which we show is equivalent to performing dual-preconditioned gradient descent on the value function generated by its regret. Using these optimal dynamics, we provide bounds on their rates of convergence to solutions of convex optimization problems. Though closed-form optimal dynamics cannot be obtained in general, we present fast numerical methods for approximating them, generating optimization algorithms which directly optimize their long-term regret. Lastly, these are benchmarked against commonly used optimization algorithms to demonstrate their effectiveness.
Most stochastic gradient descent algorithms can optimize neural networks that are sub-differentiable in their parameters, which requires their activation function to exhibit a degree of continuity. However, this continuity constraint on the activatio n function prevents these neural models from uniformly approximating discontinuous functions. This paper focuses on the case where the discontinuities arise from distinct sub-patterns, each defined on different parts of the input space. We propose a new discontinuous deep neural network model trainable via a decoupled two-step procedure that avoids passing gradient updates through the networks non-differentiable unit. We provide universal approximation guarantees for our architecture in the space of bounded continuous functions and in the space of piecewise continuous functions, which we introduced herein. We present a novel semi-supervised two-step training procedure for our discontinuous deep learning model, and we provide theoretical support for its effectiveness. The performance of our architecture is evaluated experimentally on two real-world datasets and one synthetic dataset.
Most $L^p$-type universal approximation theorems guarantee that a given machine learning model class $mathscr{F}subseteq C(mathbb{R}^d,mathbb{R}^D)$ is dense in $L^p_{mu}(mathbb{R}^d,mathbb{R}^D)$ for any suitable finite Borel measure $mu$ on $mathbb {R}^d$. Unfortunately, this means that the models approximation quality can rapidly degenerate outside some compact subset of $mathbb{R}^d$, as any such measure is largely concentrated on some bounded subset of $mathbb{R}^d$. This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which upgrades $mathscr{F}$s approximation property in the following sense. The transformed model class, denoted by $mathscr{F}text{-tope}$, is shown to be dense in $L^p_{mu,text{strict}}(mathbb{R}^d,mathbb{R}^D)$ which is a topological space whose elements are locally $p$-integrable functions and whose topology is much finer than usual norm topology on $L^p_{mu}(mathbb{R}^d,mathbb{R}^D)$; here $mu$ is any suitable $sigma$-finite Borel measure $mu$ on $mathbb{R}^d$. Next, we show that if $mathscr{F}$ is any family of analytic functions then there is always a strict gap between $mathscr{F}text{-tope}$s expressibility and that of $mathscr{F}$, since we find that $mathscr{F}$ can never dense in $L^p_{mu,text{strict}}(mathbb{R}^d,mathbb{R}^D)$. In the general case, where $mathscr{F}$ may contain non-analytic functions, we provide an abstract form of these results guaranteeing that there always exists some function space in which $mathscr{F}text{-tope}$ is dense but $mathscr{F}$ is not, while, the converse is never possible. Applications to feedforward networks, convolutional neural networks, and polynomial bases are explored.
Modifications to a neural networks input and output layers are often required to accommodate the specificities of most practical learning tasks. However, the impact of such changes on architectures approximation capabilities is largely not understood . We present general conditions describing feature and readout maps that preserve an architectures ability to approximate any continuous functions uniformly on compacts. As an application, we show that if an architecture is capable of universal approximation, then modifying its final layer to produce binary values creates a new architecture capable of deterministically approximating any classifier. In particular, we obtain guarantees for deep CNNs and deep feed-forward networks. Our results also have consequences within the scope of geometric deep learning. Specifically, when the input and output spaces are Cartan-Hadamard manifolds, we obtain geometrically meaningful feature and readout maps satisfying our criteria. Consequently, commonly used non-Euclidean regression models between spaces of symmetric positive definite matrices are extended to universal DNNs. The same result allows us to show that the hyperbolic feed-forward networks, used for hierarchical learning, are universal. Our result is also used to show that the common practice of randomizing all but the last two layers of a DNN produces a universal family of functions with probability one. We also provide conditions on a DNNs first (resp. last) few layers connections and activation function which guarantee that these layers can have a width equal to the input (resp. output) spaces dimension while not negatively affecting the architectures approximation capabilities.
The robust PCA of covariance matrices plays an essential role when isolating key explanatory features. The currently available methods for performing such a low-rank plus sparse decomposition are matrix specific, meaning, those algorithms must re-run for every new matrix. Since these algorithms are computationally expensive, it is preferable to learn and store a function that instantaneously performs this decomposition when evaluated. Therefore, we introduce Denise, a deep learning-based algorithm for robust PCA of covariance matrices, or more generally of symmetric positive semidefinite matrices, which learns precisely such a function. Theoretical guarantees for Denise are provided. These include a novel universal approximation theorem adapted to our geometric deep learning problem, convergence to an optimal solution of the learning problem and convergence of the training scheme. Our experiments show that Denise matches state-of-the-art performance in terms of decomposition quality, while being approximately 2000x faster than the state-of-the-art, PCP, and 200x faster than the current speed optimized method, fast PCP.
129 - Anastasis Kratsios 2019
This paper introduces an intermediary between conditional expectation and conditional sublinear expectation, called R-conditioning. The R-conditioning of a random-vector in $L^2$ is defined as the best $L^2$-estimate, given a $sigma$-subalgebra and a degree of model uncertainty. When the random vector represents the payoff of derivative security in a complete financial market, its R-conditioning with respect to the risk-neutral measure is interpreted as its risk-averse value. The optimization problem defining the optimization R-conditioning is shown to be well-posed. We show that the R-conditioning operators can be used to approximate a large class of sublinear expectations to arbitrary precision. We then introduce a novel numerical algorithm for computing the R-conditioning. This algorithm is shown to be strongly convergent. Implementations are used to compare the risk-averse value of a Vanilla option to its traditional risk-neutral value, within the Black-Scholes-Merton framework. Concrete connections to robust finance, sensitivity analysis, and high-dimensional estimation are all treated in this paper.
Effective feature representation is key to the predictive performance of any algorithm. This paper introduces a meta-procedure, called Non-Euclidean Upgrading (NEU), which learns feature maps that are expressive enough to embed the universal approxim ation property (UAP) into most model classes while only outputting feature maps that preserve any model classs UAP. We show that NEU can learn any feature map with these two properties if that feature map is asymptotically deformable into the identity. We also find that the feature-representations learned by NEU are always submanifolds of the feature space. NEUs properties are derived from a new deep neural model that is universal amongst all orientation-preserving homeomorphisms on the input space. We derive qualitative and quantitative approximation guarantees for this architecture. We quantify the number of parameters required for this new architecture to memorize any set of input-output pairs while simultaneously fixing every point of the input space lying outside some compact set, and we quantify the size of this set as a function of our models depth. Moreover, we show that no deep feed-forward network with commonly used activation function has all these properties. NEUs performance is evaluated against competing machine learning methods on various regression and dimension reduction tasks both with financial and simulated data.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا