Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot

280 0 0.0 ( 0 )

Download Cite

Added by Tianle Cai

Publication date 2020

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Jingtong Su - Yihang Chen - Tianle Cai

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Network pruning is a method for reducing test-time computational resource requirements with minimal performance degradation. Conventional wisdom of pruning algorithms suggests that: (1) Pruning methods exploit information from training data to find good subnetworks; (2) The architecture of the pruned network is crucial for good performance. In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods and surprisingly find that: (1) A set of methods which aims to find good subnetworks of the randomly-initialized network (which we call initial tickets), hardly exploits any information from the training data; (2) For the pruned networks obtained by these methods, randomly changing the preserved weights in each layer, while keeping the total number of preserved weights unchanged per layer, does not affect the final performance. These findings inspire us to choose a series of simple emph{data-independent} prune ratios for each layer, and randomly prune each layer accordingly to get a subnetwork (which we call random tickets). Experimental results show that our zero-shot random tickets outperform or attain a similar performance compared to existing initial tickets. In addition, we identify one existing pruning method that passes our sanity checks. We hybridize the ratios in our random ticket with this method and propose a new method called hybrid tickets, which achieves further improvement. (Our code is publicly available at https://github.com/JingtongSu/sanity-checking-pruning)

rate research

Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot?

105 - Xiaolong Ma , Geng Yuan , Xuan Shen 2021

There have been long-standing controversies and inconsistencies over the experiment setup and criteria for identifying the winning ticket in literature. To reconcile such, we revisit the definition of lottery ticket hypothesis, with comprehensive and more rigorous conditions. Under our new definition, we show concrete evidence to clarify whether the winning ticket exists across the major DNN architectures and/or applications. Through extensive experiments, we perform quantitative analysis on the correlations between winning tickets and various experimental factors, and empirically study the patterns of our observations. We find that the key training hyperparameters, such as learning rate and training epochs, as well as the architecture characteristics such as capacities and residual connections, are all highly correlated with whether and when the winning tickets can be identified. Based on our analysis, we summarize a guideline for parameter settings in regards of specific architecture characteristics, which we hope to catalyze the research progress on the topic of lottery ticket hypothesis.

Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win

142 - Jaron Maene , Mingxiao Li , Marie-Francine Moens 2021

The lottery ticket hypothesis states that sparse subnetworks exist in randomly initialized dense networks that can be trained to the same accuracy as the dense network they reside in. However, the subsequent work has failed to replicate this on large-scale models and required rewinding to an early stable state instead of initialization. We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization. Our subsequent experiments on common vision tasks give strong credence to the hypothesis in Evci et al. (2020b) that lottery tickets simply retrain to the same regions (although not necessarily to the same basin). These results imply that existing lottery tickets could not have been found without the preceding dense training by iterative magnitude pruning, raising doubts about the use of the lottery ticket hypothesis.

Machine Learning

Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

153 - Utku Evci , Yani A. Ioannou , Cem Keskin 2020

Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exception of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). In this work, we attempt to answer: (1) why training unstructured sparse networks from random initialization performs poorly and; (2) what makes LTs and DST the exceptions? We show that sparse NNs have poor gradient flow at initialization and propose a modified initialization for unstructured connectivity. Furthermore, we find that DST methods significantly improve gradient flow during training over traditional sparse training methods. Finally, we show that LTs do not improve gradient flow, rather their success lies in re-learning the pruning solution they are derived from - however, this comes at the cost of learning novel solutions.

Machine Learning Computer Vision and Pattern Recognition

Slow and Stale Gradients Can Win the Race

82 - Sanghamitra Dutta , Jianyu Wang , Gauri Joshi 2020

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error. In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime(wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on CIFAR10 dataset.

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models

278 - Jiaoda Li , Duygu Ataman , Rico Sennrich 2021

Multimodal machine translation (MMT) systems have been shown to outperform their text-only neural machine translation (NMT) counterparts when visual context is available. However, recent studies have also shown that the performance of MMT models is only marginally impacted when the associated image is replaced with an unrelated image or noise, which suggests that the visual context might not be exploited by the model at all. We hypothesize that this might be caused by the nature of the commonly used evaluation benchmark, also known as Multi30K, where the translations of image captions were prepared without actually showing the images to human translators. In this paper, we present a qualitative study that examines the role of datasets in stimulating the leverage of visual modality and we propose methods to highlight the importance of visual signals in the datasets which demonstrate improvements in reliance of models on the source images. Our findings suggest the research on effective MMT architectures is currently impaired by the lack of suitable datasets and careful consideration must be taken in creation of future MMT datasets, for which we also provide useful insights.

Computation and Language

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions