No Arabic abstract
Real world data often exhibit low-dimensional geometric structures, and can be viewed as samples near a low-dimensional manifold. This paper studies nonparametric regression of H{o}lder functions on low-dimensional manifolds using deep ReLU networks. Suppose $n$ training data are sampled from a H{o}lder function in $mathcal{H}^{s,alpha}$ supported on a $d$-dimensional Riemannian manifold isometrically embedded in $mathbb{R}^D$, with sub-gaussian noise. A deep ReLU network architecture is designed to estimate the underlying function from the training data. The mean squared error of the empirical estimator is proved to converge in the order of $n^{-frac{2(s+alpha)}{2(s+alpha) + d}}log^3 n$. This result shows that deep ReLU networks give rise to a fast convergence rate depending on the data intrinsic dimension $d$, which is usually much smaller than the ambient dimension $D$. It therefore demonstrates the adaptivity of deep ReLU networks to low-dimensional geometric structures of data, and partially explains the power of deep ReLU networks in tackling high-dimensional data with low-dimensional geometric structures.
This article is concerned with the approximation and expressive powers of deep neural networks. This is an active research area currently producing many interesting papers. The results most commonly found in the literature prove that neural networks approximate functions with classical smoothness to the same accuracy as classical linear methods of approximation, e.g. approximation by polynomials or by piecewise polynomials on prescribed partitions. However, approximation by neural networks depending on n parameters is a form of nonlinear approximation and as such should be compared with other nonlinear methods such as variable knot splines or n-term approximation from dictionaries. The performance of neural networks in targeted applications such as machine learning indicate that they actually possess even greater approximation power than these traditional methods of nonlinear approximation. The main results of this article prove that this is indeed the case. This is done by exhibiting large classes of functions which can be efficiently captured by neural networks where classical nonlinear methods fall short of the task. The present article purposefully limits itself to studying the approximation of univariate functions by ReLU networks. Many generalizations to functions of several variables and other activation functions can be envisioned. However, even in this simplest of settings considered here, a theory that completely quantifies the approximation power of neural networks is still lacking.
This paper is devoted to establishing $L^2$ approximation properties for deep ReLU convolutional neural networks (CNNs) on two-dimensional space. The analysis is based on a decomposition theorem for convolutional kernels with large spatial size and multi-channel. Given that decomposition and the property of the ReLU activation function, a universal approximation theorem of deep ReLU CNNs with classic structure is obtained by showing its connection with ReLU deep neural networks (DNNs) with one hidden layer. Furthermore, approximation properties are also obtained for neural networks with ResNet, pre-act ResNet, and MgNet architecture based on connections between these networks.
Causal inference explores the causation between actions and the consequent rewards on a covariate set. Recently deep learning has achieved a remarkable performance in causal inference, but existing statistical theories cannot well explain such an empirical success, especially when the covariates are high-dimensional. Most theoretical results in causal inference are asymptotic, suffer from the curse of dimensionality, and only work for the finite-action scenario. To bridge such a gap between theory and practice, this paper studies doubly robust off-policy learning by deep neural networks. When the covariates lie on a low-dimensional manifold, we prove nonasymptotic regret bounds, which converge at a fast rate depending on the intrinsic dimension of the manifold. Our results cover both the finite- and continuous-action scenarios. Our theory shows that deep neural networks are adaptive to the low-dimensional geometric structures of the covariates, and partially explains the success of deep learning for causal inference.
Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on high-dimensional data. To bridge this gap, we propose to exploit low-dimensional geometric structures of the real world data sets. We establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical estimation for binary classification. Specifically, given the data lying on a $d$-dimensional manifold isometrically embedded in $mathbb{R}^D$, we prove that if the network architecture is properly chosen, ConvResNets can (1) approximate Besov functions on manifolds with arbitrary accuracy, and (2) learn a classifier by minimizing the empirical logistic risk, which gives an excess risk in the order of $n^{-frac{s}{2s+2(svee d)}}$, where $s$ is a smoothness parameter. This implies that the sample complexity depends on the intrinsic dimension $d$, instead of the data dimension $D$. Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets.
It has been widely assumed that a neural network cannot be recovered from its outputs, as the network depends on its parameters in a highly nonlinear way. Here, we prove that in fact it is often possible to identify the architecture, weights, and biases of an unknown deep ReLU network by observing only its output. Every ReLU network defines a piecewise linear function, where the boundaries between linear regions correspond to inputs for which some neuron in the network switches between inactive and active ReLU states. By dissecting the set of region boundaries into components associated with particular neurons, we show both theoretically and empirically that it is possible to recover the weights of neurons and their arrangement within the network, up to isomorphism.