ترغب بنشر مسار تعليمي؟ اضغط هنا

Transportation analysis of denoising autoencoders: a novel method for analyzing deep neural networks

108   0   0.0 ( 0 )
 نشر من قبل Sho Sonoda
 تاريخ النشر 2017
والبحث باللغة English




اسأل ChatGPT حول البحث

The feature map obtained from the denoising autoencoder (DAE) is investigated by determining transportation dynamics of the DAE, which is a cornerstone for deep learning. Despite the rapid development in its application, deep neural networks remain analytically unexplained, because the feature maps are nested and parameters are not faithful. In this paper, we address the problem of the formulation of nested complex of parameters by regarding the feature map as a transport map. Even when a feature map has different dimensions between input and output, we can regard it as a transportation map by considering that both the input and output spaces are embedded in a common high-dimensional space. In addition, the trajectory is a geometric object and thus, is independent of parameterization. In this manner, transportation can be regarded as a universal character of deep neural networks. By determining and analyzing the transportation dynamics, we can understand the behavior of a deep neural network. In this paper, we investigate a fundamental case of deep neural networks: the DAE. We derive the transport map of the DAE, and reveal that the infinitely deep DAE transports mass to decrease a certain quantity, such as entropy, of the data distribution. These results though analytically simple, shed light on the correspondence between deep neural networks and the Wasserstein gradient flows.

قيم البحث

اقرأ أيضاً

The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. Inspired by non-Gaussian natural phenomena, we consider the GN in a more general context and invoke the generalized CLT (GCLT), which suggests that the GN converges to a heavy-tailed $alpha$-stable random variable. Accordingly, we propose to analyze SGD as an SDE driven by a L{e}vy motion. Such SDEs can incur `jumps, which force the SDE transition from narrow minima to wider minima, as proven by existing metastability theory. To validate the $alpha$-stable assumption, we conduct extensive experiments on common deep learning architectures and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We further investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima.
86 - Hengyue Pan , Hui Jiang , Xin Niu 2018
The past few years have witnessed the fast development of different regularization methods for deep learning models such as fully-connected deep neural networks (DNNs) and Convolutional Neural Networks (CNNs). Most of previous methods mainly consider to drop features from input data and hidden layers, such as Dropout, Cutout and DropBlocks. DropConnect select to drop connections between fully-connected layers. By randomly discard some features or connections, the above mentioned methods control the overfitting problem and improve the performance of neural networks. In this paper, we proposed two novel regularization methods, namely DropFilter and DropFilter-PLUS, for the learning of CNNs. Different from the previous methods, DropFilter and DropFilter-PLUS selects to modify the convolution filters. For DropFilter-PLUS, we find a suitable way to accelerate the learning process based on theoretical analysis. Experimental results on MNIST show that using DropFilter and DropFilter-PLUS may improve performance on image classification tasks.
Autoencoders have emerged as a useful framework for unsupervised learning of internal representations, and a wide variety of apparently conceptually disparate regularization techniques have been proposed to generate useful features. Here we extend ex isting denoising autoencoders to additionally inject noise before the nonlinearity, and at the hidden unit activations. We show that a wide variety of previous methods, including denoising, contractive, and sparse autoencoders, as well as dropout can be interpreted using this framework. This noise injection framework reaps practical benefits by providing a unified strategy to develop new internal representations by designing the nature of the injected noise. We show that noisy autoencoders outperform denoising autoencoders at the very task of denoising, and are competitive with other single-layer techniques on MNIST, and CIFAR-10. We also show that types of noise other than dropout improve performance in a deep network through sparsifying, decorrelating, and spreading information across representations.
Cancer is still one of the most devastating diseases of our time. One way of automatically classifying tumor samples is by analyzing its derived molecular information (i.e., its genes expression signatures). In this work, we aim to distinguish three different types of cancer: thyroid, skin, and stomach. For that, we compare the performance of a Denoising Autoencoder (DAE) used as weight initialization of a deep neural network. Although we address a different domain problem in this work, we have adopted the same methodology of Ferreira et al.. In our experiments, we assess two different approaches when training the classification model: (a) fixing the weights, after pre-training the DAE, and (b) allowing fine-tuning of the entire classification network. Additionally, we apply two different strategies for embedding the DAE into the classification network: (1) by only importing the encoding layers, and (2) by inserting the complete autoencoder. Our best result was the combination of unsupervised feature learning through a DAE, followed by its full import into the classification network, and subsequent fine-tuning through supervised training, achieving an F1 score of 98.04% +/- 1.09 when identifying cancerous thyroid samples.
Deep neural networks are widely used for nonlinear function approximation with applications ranging from computer vision to control. Although these networks involve the composition of simple arithmetic operations, it can be very challenging to verify whether a particular network satisfies certain input-output properties. This article surveys methods that have emerged recently for soundly verifying such properties. These methods borrow insights from reachability analysis, optimization, and search. We discuss fundamental differences and connections between existing algorithms. In addition, we provide pedagogical implementations of existing methods and compare them on a set of benchmark problems.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا