Soft Threshold Weight Reparameterization for Learnable Sparsity

114 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Aditya Kusupati

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Aditya Kusupati - Vivek Ramanujan - Raghav Somani

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). This work proposes Soft Threshold Reparameterization (STR), a novel use of the soft-threshold operator on DNN weights. STR smoothly induces sparsity while learning pruning thresholds thereby obtaining a non-uniform sparsity budget. Our method achieves state-of-the-art accuracy for unstructured sparsity in CNNs (ResNet50 and MobileNetV1 on ImageNet-1K), and, additionally, learns non-uniform budgets that empirically reduce the FLOPs by up to 50%. Notably, STR boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs. In short, STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics. Code, pretrained models and sparsity budgets are at https://github.com/RAIVNLab/STR.

قيم البحث

142 - Deniz Oktay , Johannes Balle , Saurabh Singh 2019

We describe a simple and general neural network weight compression approach, in which the network parameters (weights and biases) are represented in a latent space, amounting to a reparameterization. This space is equipped with a learned probability model, which is used to impose an entropy penalty on the parameter representation during training, and to compress the representation using a simple arithmetic coder after training. Classification accuracy and model compressibility is maximized jointly, with the bitrate--accuracy trade-off specified by a hyperparameter. We evaluate the method on the MNIST, CIFAR-10 and ImageNet classification benchmarks using six distinct model architectures. Our results show that state-of-the-art model compression can be achieved in a scalable and general way without requiring complex procedures such as multi-stage training.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Learnable Parameter Similarity

135 - Guangcong Wang , Jianhuang Lai , Wenqi Liang 2019

Most of the existing approaches focus on specific visual tasks while ignoring the relations between them. Estimating task relation sheds light on the learning of high-order semantic concepts, e.g., transfer learning. How to reveal the underlying rela tions between different visual tasks remains largely unexplored. In this paper, we propose a novel textbf{L}earnable textbf{P}arameter textbf{S}imilarity (textbf{LPS}) method that learns an effective metric to measure the similarity of second-order semantics hidden in trained models. LPS is achieved by using a second-order neural network to align high-dimensional model parameters and learning second-order similarity in an end-to-end way. In addition, we create a model set called ModelSet500 as a parameter similarity learning benchmark that contains 500 trained models. Extensive experiments on ModelSet500 validate the effectiveness of the proposed method. Code will be released at url{https://github.com/Wanggcong/learnable-parameter-similarity}.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Shape Adaptor: A Learnable Resizing Module

74 - Shikun Liu , Zhe Lin , Yilin Wang 2020

We present a novel resizing module for neural networks: shape adaptor, a drop-in enhancement built on top of traditional resizing layers, such as pooling, bilinear sampling, and strided convolution. Whilst traditional resizing layers have fixed and d eterministic reshaping factors, our module allows for a learnable reshaping factor. Our implementation enables shape adaptors to be trained end-to-end without any additional supervision, through which network architectures can be optimised for each individual task, in a fully automated way. We performed experiments across seven image classification datasets, and results show that by simply using a set of our shape adaptors instead of the original resizing layers, performance increases consistently over human-designed networks, across all datasets. Additionally, we show the effectiveness of shape adaptors on two other applications: network compression and transfer learning. The source code is available at: https://github.com/lorenmt/shape-adaptor.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

132 - Shurui Li , Wojciech Romaszkan , Alexander Graening 2021

Quantization is spearheading the increase in performance and efficiency of neural network computing systems making headway into commodity hardware. We present SWIS - Shared Weight bIt Sparsity, a quantization framework for efficient neural network in ference acceleration delivering improved performance and storage compression through an offline weight decomposition and scheduling algorithm. SWIS can achieve up to 54.3% (19.8%) point accuracy improvement compared to weight truncation when quantizing MobileNet-v2 to 4 (2) bits post-training (with retraining) showing the strength of leveraging shared bit-sparsity in weights. SWIS accelerator gives up to 6x speedup and 1.9x energy improvement overstate of the art bit-serial architectures.

التعلم الآلي

Adversarial Weight Perturbation Helps Robust Generalization

122 - Dongxian Wu , Shu-tao Xia , Yisen Wang 2020

The study on improving the robustness of deep neural networks against adversarial examples grows rapidly in recent years. Among them, adversarial training is the most promising one, which flattens the input loss landscape (loss change with respect to input) via training on adversarially perturbed examples. However, how the widely used weight loss landscape (loss change with respect to weight) performs in adversarial training is rarely explored. In this paper, we investigate the weight loss landscape from a new perspective, and identify a clear correlation between the flatness of weight loss landscape and robust generalization gap. Several well-recognized adversarial training improvements, such as early stopping, designing new objective functions, or leveraging unlabeled data, all implicitly flatten the weight loss landscape. Based on these observations, we propose a simple yet effective Adversarial Weight Perturbation (AWP) to explicitly regularize the flatness of weight loss landscape, forming a double-perturbation mechanism in the adversarial training framework that adversarially perturbs both inputs and weights. Extensive experiments demonstrate that AWP indeed brings flatter weight loss landscape and can be easily incorporated into various existing adversarial training methods to further boost their adversarial robustness.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي