Neural networks behave as hash encoders: An empirical study

121 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Fengxiang He

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Fengxiang He - Shiye Lei - Jianmin Ji

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions, each corresponding to a specific activation pattern of the included ReLU-like activations. We demonstrate that this partition exhibits the following encoding properties across a variety of deep learning models: (1) {it determinism}: almost every linear region contains at most one training example. We can therefore represent almost every training example by a unique activation pattern, which is parameterized by a {it neural code}; and (2) {it categorization}: according to the neural code, simple algorithms, such as $K$-Means, $K$-NN, and logistic regression, can achieve fairly good performance on both training and test data. These encoding properties surprisingly suggest that {it normal neural networks well-trained for classification behave as hash encoders without any extra efforts.} In addition, the encoding properties exhibit variability in different scenarios. {Further experiments demonstrate that {it model size}, {it training time}, {it training sample size}, {it regularization}, and {it label noise} contribute in shaping the encoding properties, while the impacts of the first three are dominant.} We then define an {it activation hash phase chart} to represent the space expanded by {model size}, training time, training sample size, and the encoding properties, which is divided into three canonical regions: {it under-expressive regime}, {it critically-expressive regime}, and {it sufficiently-expressive regime}. The source code package is available at url{https://github.com/LeavesLei/activation-code}.

قيم البحث

75 - Paul K. Rubenstein , Yunpeng Li , Dominik Roblek 2018

Generative adversarial networks (GANs) are capable of producing high quality image samples. However, unlike variational autoencoders (VAEs), GANs lack encoders that provide the inverse mapping for the generators, i.e., encode images back to the laten t space. In this work, we consider adversarially learned generative models that also have encoders. We evaluate models based on their ability to produce high quality samples and reconstructions of real images. Our main contributions are twofold: First, we find that the baseline Bidirectional GAN (BiGAN) can be improved upon with the addition of an autoencoder loss, at the expense of an extra hyper-parameter to tune. Second, we show that comparable performance to BiGAN can be obtained by simply training an encoder to invert the generator of a normal GAN.

التعلم الالي الذكاء الاصطناعي التعلم الآلي

Finite Versus Infinite Neural Networks: an Empirical Study

147 - Jaehoon Lee , Samuel S. Schoenholz , Jeffrey Pennington 2020

We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our ex perimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating point precision limits kernel performance beyond a critical dataset size; regularized ZCA whitening improves accuracy; finite network performance depends non-monotonically on width in ways not captured by double descent phenomena; equivariance of CNNs is only beneficial for narrow networks far from the kernel regime. Our experiments additionally motivate an improved layer-wise scaling for weight decay which improves generalization in finite-width networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction, including a novel ensembling technique. Using these best practices we achieve state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class we consider.

التعلم الآلي التعلم الالي

Overfitting in Bayesian Optimization: an empirical study and early-stopping solution

169 - Anastasia Makarova , Huibin Shen , Valerio Perrone 2021

Tuning machine learning models with Bayesian optimization (BO) is a successful strategy to find good hyperparameters. BO defines an iterative procedure where a cross-validated metric is evaluated on promising hyperparameters. In practice, however, an improvement of the validation metric may not translate in better predictive performance on a test set, especially when tuning models trained on small datasets. In other words, unlike conventional wisdom dictates, BO can overfit. In this paper, we carry out the first systematic investigation of overfitting in BO and demonstrate that this issue is serious, yet often overlooked in practice. We propose a novel criterion to early stop BO, which aims to maintain the solution quality while saving the unnecessary iterations that can lead to overfitting. Experiments on real-world hyperparameter optimization problems show that our approach effectively meets these goals and is more adaptive comparing to baselines.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Consumer-Driven Explanations for Machine Learning Decisions: An Empirical Study of Robustness

106 - Michael Hind , Dennis Wei , Yunfeng Zhang 2020

Many proposed methods for explaining machine learning predictions are in fact challenging to understand for nontechnical consumers. This paper builds upon an alternative consumer-driven approach called TED that asks for explanations to be provided in training data, along with target labels. Using semi-synthetic data from credit approval and employee retention applications, experiments are conducted to investigate some practical considerations with TED, including its performance with different classification algorithms, varying numbers of explanations, and variability in explanations. A new algorithm is proposed to handle the case where some training examples do not have explanations. Our results show that TED is robust to increasing numbers of explanations, noisy explanations, and large fractions of missing explanations, thus making advances toward its practical deployment.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

XDeep: An Interpretation Tool for Deep Neural Networks

88 - Fan Yang , Zijian Zhang , Haofan Wang 2019

XDeep is an open-source Python package developed to interpret deep models for both practitioners and researchers. Overall, XDeep takes a trained deep neural network (DNN) as the input, and generates relevant interpretations as the output with the pos t-hoc manner. From the functionality perspective, XDeep integrates a wide range of interpretation algorithms from the state-of-the-arts, covering different types of methodologies, and is capable of providing both local explanation and global explanation for DNN when interpreting model behaviours. With the well-documented API designed in XDeep, end-users can easily obtain the interpretations for their deep models at hand with several lines of codes, and compare the results among different algorithms. XDeep is generally compatible with Python 3, and can be installed through Python Package Index (PyPI). The source codes are available at: https://github.com/datamllab/xdeep.

التعلم الآلي الذكاء الاصطناعي التعلم الالي