Gi and Pal Scores: Deep Neural Network Generalization Statistics

70 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Brian Quanz

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yair Schiff - Brian Quanz - Payel Das

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The field of Deep Learning is rich with empirical evidence of human-like performance on a variety of regression, classification, and control tasks. However, despite these successes, the field lacks strong theoretical error bounds and consistent measures of network generalization and learned invariances. In this work, we introduce two new measures, the Gi-score and Pal-score, that capture a deep neural networks generalization capabilities. Inspired by the Gini coefficient and Palma ratio, measures of income inequality, our statistics are robust measures of a networks invariance to perturbations that accurately predict generalization gaps, i.e., the difference between accuracy on training and test sets.

قيم البحث

173 - Yair Schiff , Brian Quanz , Payel Das 2021

The field of Deep Learning is rich with empirical evidence of human-like performance on a variety of prediction tasks. However, despite these successes, the recent Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition suggests th at there is a need for more robust and efficient measures of network generalization. In this work, we propose a new framework for evaluating the generalization capabilities of trained networks. We use perturbation response (PR) curves that capture the accuracy change of a given network as a function of varying levels of training sample perturbation. From these PR curves, we derive novel statistics that capture generalization capability. Specifically, we introduce two new measures for accurately predicting generalization gaps: the Gi-score and Pal-score, that are inspired by the Gini coefficient and Palma ratio (measures of income inequality), that accurately predict generalization gaps. Using our framework applied to intra and inter class sample mixup, we attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the PGDL competition. In addition, we show that our framework and the proposed statistics can be used to capture to what extent a trained network is invariant to a given parametric input transformation, such as rotation or translation. Therefore, these generalization gap prediction statistics also provide a useful means for selecting the optimal network architectures and hyperparameters that are invariant to a certain perturbation.

التعلم الآلي الذكاء الاصطناعي

Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

72 - Taiji Suzuki , Hiroshi Abe , Tomoaki Nishimura 2019

One of the biggest issues in deep learning theory is the generalization ability of networks with huge model size. The classical learning theory suggests that overparameterized models cause overfitting. However, practically used large deep models avoi d overfitting, which is not well explained by the classical approaches. To resolve this issue, several attempts have been made. Among them, the compression based bound is one of the promising approaches. However, the compression based bound can be applied only to a compressed network, and it is not applicable to the non-compressed original network. In this paper, we give a unified frame-work that can convert compression based bounds to those for non-compressed original networks. The bound gives even better rate than the one for the compressed network by improving the bias term. By establishing the unified frame-work, we can obtain a data dependent generalization error bound which gives a tighter evaluation than the data independent ones.

التعلم الآلي التعلم الالي

Dual Precision Deep Neural Network

88 - Jae Hyun Park , Ji Sub Choi , Jong Hwan Ko 2020

On-line Precision scalability of the deep neural networks(DNNs) is a critical feature to support accuracy and complexity trade-off during the DNN inference. In this paper, we propose dual-precision DNN that includes two different precision modes in a single model, thereby supporting an on-line precision switch without re-training. The proposed two-phase training process optimizes both low- and high-precision modes.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

Robustness to Pruning Predicts Generalization in Deep Neural Networks

161 - Lorenz Kuhn , Clare Lyle , Aidan N. Gomez 2021

Existing generalization measures that aim to capture a models simplicity based on parameter counts or norms fail to explain generalization in overparameterized deep neural networks. In this paper, we introduce a new, theoretically motivated measure o f a networks simplicity which we call prunability: the smallest emph{fraction} of the networks parameters that can be kept while pruning without adversely affecting its training loss. We show that this measure is highly predictive of a models generalization performance across a large set of convolutional networks trained on CIFAR-10, does not grow with network size unlike existing pruning-based measures, and exhibits high correlation with test set loss even in a particularly challenging double descent setting. Lastly, we show that the success of prunability cannot be explained by its relation to known complexity measures based on models margin, flatness of minima and optimization speed, finding that our new measure is similar to -- but more predictive than -- existing flatness-based measures, and that its predictions exhibit low mutual information with those of other baselines.

التعلم الآلي التعلم الالي

MaxUp: A Simple Way to Improve Generalization of Neural Network Training

79 - Chengyue Gong , Tongzheng Ren , Mao Ye 2020

We propose emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random pe rturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, we implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. For example, in the case of Gaussian perturbation, emph{MaxUp} is asymptotically equivalent to using the gradient norm of the loss as a penalty to encourage smoothness. We test emph{MaxUp} on a range of tasks, including image classification, language modeling, and adversarial certification, on which emph{MaxUp} consistently outperforms the existing best baseline methods, without introducing substantial computational overhead. In particular, we improve ImageNet classification from the state-of-the-art top-1 accuracy $85.5%$ without extra data to $85.8%$. Code will be released soon.

التعلم الآلي التعلم الالي