Do you want to publish a course? Click here

Transport Analysis of Infinitely Deep Neural Network

74   0   0.0 ( 0 )
 Added by Sho Sonoda Dr
 Publication date 2016
and research's language is English




Ask ChatGPT about the research

We investigated the feature map inside deep neural networks (DNNs) by tracking the transport map. We are interested in the role of depth (why do DNNs perform better than shallow models?) and the interpretation of DNNs (what do intermediate layers do?) Despite the rapid development in their application, DNNs remain analytically unexplained because the hidden layers are nested and the parameters are not faithful. Inspired by the integral representation of shallow NNs, which is the continuum limit of the width, or the hidden unit number, we developed the flow representation and transport analysis of DNNs. The flow representation is the continuum limit of the depth or the hidden layer number, and it is specified by an ordinary differential equation with a vector field. We interpret an ordinary DNN as a transport map or a Euler broken line approximation of the flow. Technically speaking, a dynamical system is a natural model for the nested feature maps. In addition, it opens a new way to the coordinate-free treatment of DNNs by avoiding the redundant parametrization of DNNs. Following Wasserstein geometry, we analyze a flow in three aspects: dynamical system, continuity equation, and Wasserstein gradient flow. A key finding is that we specified a series of transport maps of the denoising autoencoder (DAE). Starting from the shallow DAE, this paper develops three topics: the transport map of the deep DAE, the equivalence between the stacked DAE and the composition of DAEs, and the development of the double continuum limit or the integral representation of the flow representation. As partial answers to the research questions, we found that deeper DAEs converge faster and the extracted features are better; in addition, a deep Gaussian DAE transports mass to decrease the Shannon entropy of the data distribution.



rate research

Read More

One of the biggest issues in deep learning theory is the generalization ability of networks with huge model size. The classical learning theory suggests that overparameterized models cause overfitting. However, practically used large deep models avoid overfitting, which is not well explained by the classical approaches. To resolve this issue, several attempts have been made. Among them, the compression based bound is one of the promising approaches. However, the compression based bound can be applied only to a compressed network, and it is not applicable to the non-compressed original network. In this paper, we give a unified frame-work that can convert compression based bounds to those for non-compressed original networks. The bound gives even better rate than the one for the compressed network by improving the bias term. By establishing the unified frame-work, we can obtain a data dependent generalization error bound which gives a tighter evaluation than the data independent ones.
130 - Yikuan Li , Yajie Zhu 2019
Deep Bayesian neural network has aroused a great attention in recent years since it combines the benefits of deep neural network and probability theory. Because of this, the network can make predictions and quantify the uncertainty of the predictions at the same time, which is important in many life-threatening areas. However, most of the recent researches are mainly focusing on making the Bayesian neural network easier to train, and proposing methods to estimate the uncertainty. I notice there are very few works that properly discuss the ways to measure the performance of the Bayesian neural network. Although accuracy and average uncertainty are commonly used for now, they are too general to provide any insight information about the model. In this paper, we would like to introduce more specific criteria and propose several metrics to measure the model performance from different perspectives, which include model calibration measurement, data rejection ability and uncertainty divergence for samples from the same and different distributions.
In domains such as health care and finance, shortage of labeled data and computational resources is a critical issue while developing machine learning algorithms. To address the issue of labeled data scarcity in training and deployment of neural network-based systems, we propose a new technique to train deep neural networks over several data sources. Our method allows for deep neural networks to be trained using data from multiple entities in a distributed fashion. We evaluate our algorithm on existing datasets and show that it obtains performance which is similar to a regular neural network trained on a single machine. We further extend it to incorporate semi-supervised learning when training with few labeled samples, and analyze any security concerns that may arise. Our algorithm paves the way for distributed training of deep neural networks in data sensitive applications when raw data may not be shared directly.
238 - Zhijie Deng , Yucen Luo , Jun Zhu 2019
Bayesian neural networks (BNNs) augment deep networks with uncertainty quantification by Bayesian treatment of the network weights. However, such models face the challenge of Bayesian inference in a high-dimensional and usually over-parameterized space. This paper investigates a new line of Bayesian deep learning by performing Bayesian inference on network structure. Instead of building structure from scratch inefficiently, we draw inspirations from neural architecture search to represent the network structure. We then develop an efficient stochastic variational inference approach which unifies the learning of both network structure and weights. Empirically, our method exhibits competitive predictive performance while preserving the benefits of Bayesian principles across challenging scenarios. We also provide convincing experimental justification for our modeling choice.
123 - Zhihui Shao , , Jianyi Yang 2020
To increase the trustworthiness of deep neural network (DNN) classifiers, an accurate prediction confidence that represents the true likelihood of correctness is crucial. Towards this end, many post-hoc calibration methods have been proposed to leverage a lightweight model to map the target DNNs output layer into a calibrated confidence. Nonetheless, on an out-of-distribution (OOD) dataset in practice, the target DNN can often mis-classify samples with a high confidence, creating significant challenges for the existing calibration methods to produce an accurate confidence. In this paper, we propose a new post-hoc confidence calibration method, called CCAC (Confidence Calibration with an Auxiliary Class), for DNN classifiers on OOD datasets. The key novelty of CCAC is an auxiliary class in the calibration model which separates mis-classified samples from correctly classified ones, thus effectively mitigating the target DNNs being confidently wrong. We also propose a simplified version of CCAC to reduce free parameters and facilitate transfer to a new unseen dataset. Our experiments on different DNN models, datasets and applications show that CCAC can consistently outperform the prior post-hoc calibration methods.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا