No Arabic abstract
Neural networks are becoming an increasingly important tool in applications. However, neural networks are not widely used in statistical genetics. In this paper, we propose a new neural networks method called expectile neural networks. When the size of parameter is too large, the standard maximum likelihood procedures may not work. We use sieve method to constrain parameter space. And we prove its consistency and normality under nonparametric regression framework.
Neural networks are one of the most popularly used methods in machine learning and artificial intelligence nowadays. Due to the universal approximation theorem (Hornik et al. (1989)), a neural network with one hidden layer can approximate any continuous function on a compact support as long as the number of hidden units is sufficiently large. Statistically, a neural network can be classified into a nonlinear regression framework. However, if we consider it parametrically, due to the unidentifiability of the parameters, it is difficult to derive its asymptotic properties. Instead, we considered the estimation problem in a nonparametric regression framework and use the results from sieve estimation to establish the consistency, the rates of convergence and the asymptotic normality of the neural network estimators. We also illustrate the validity of the theories via simulations.
Imposing orthogonal transformations between layers of a neural network has been considered for several years now. This facilitates their learning, by limiting the explosion/vanishing of the gradient; decorrelates the features; improves the robustness. In this framework, this paper studies theoretical properties of orthogonal convolutional layers. More precisely, we establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. These conditions show that orthogonal convolutional transforms exist for almost all architectures used in practice. Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed. We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and remains accurate when the size of the signals/images is large. This holds for both row and column orthogonality. Finally, we confirm these theoretical results with experiments, and also empirically study the landscape of the regularization term.
In this paper, we study the properties of robust nonparametric estimation using deep neural networks for regression models with heavy tailed error distributions. We establish the non-asymptotic error bounds for a class of robust nonparametric regression estimators using deep neural networks with ReLU activation under suitable smoothness conditions on the regression function and mild conditions on the error term. In particular, we only assume that the error distribution has a finite p-th moment with p greater than one. We also show that the deep robust regression estimators are able to circumvent the curse of dimensionality when the distribution of the predictor is supported on an approximate lower-dimensional set. An important feature of our error bound is that, for ReLU neural networks with network width and network size (number of parameters) no more than the order of the square of the dimensionality d of the predictor, our excess risk bounds depend sub-linearly on d. Our assumption relaxes the exact manifold support assumption, which could be restrictive and unrealistic in practice. We also relax several crucial assumptions on the data distribution, the target regression function and the neural networks required in the recent literature. Our simulation studies demonstrate that the robust methods can significantly outperform the least squares method when the errors have heavy-tailed distributions and illustrate that the choice of loss function is important in the context of deep nonparametric regression.
Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not generalize well on a new data in the sense that they do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.
We derive asymptotic normality of kernel type deconvolution estimators of the density, the distribution function at a fixed point, and of the probability of an interval. We consider the so called super smooth case where the characteristic function of the known distribution decreases exponentially. It turns out that the limit behavior of the pointwise estimators of the density and distribution function is relatively straightforward while the asymptotics of the estimator of the probability of an interval depends in a complicated way on the sequence of bandwidths.