ﻻ يوجد ملخص باللغة العربية
We propose pruning ternary quantization (PTQ), a simple, yet effective, symmetric ternary quantization method. The method significantly compresses neural network weights to a sparse ternary of [-1,0,1] and thus reduces computational, storage, and memory footprints. We show that PTQ can convert regular weights to ternary orthonormal bases by simply using pruning and L2 projection. In addition, we introduce a refined straight-through estimator to finalize and stabilize the quantized weights. Our method can provide at most 46x compression ratio on the ResNet-18 structure, with an acceptable accuracy of 65.36%, outperforming leading methods. Furthermore, PTQ can compress a ResNet-18 model from 46 MB to 955KB (~48x) and a ResNet-50 model from 99 MB to 3.3MB (~30x), while the top-1 accuracy on ImageNet drops slightly from 69.7% to 65.3% and from 76.15% to 74.47%, respectively. Our method unifies pruning and quantization and thus provides a range of size-accuracy trade-off.
In the traditional deep compression framework, iteratively performing network pruning and quantization can reduce the model size and computation cost to meet the deployment requirements. However, such a step-wise application of pruning and quantizati
We investigate pruning and quantization for deep neural networks. Our goal is to achieve extremely high sparsity for quantized networks to enable implementation on low cost and low power accelerator hardware. In a practical scenario, there are partic
We propose the position-based scaled gradient (PSG) that scales the gradient depending on the position of a weight vector to make it more compression-friendly. First, we theoretically show that applying PSG to the standard gradient descent (GD), whic
Network pruning is widely used to compress Deep Neural Networks (DNNs). The Soft Filter Pruning (SFP) method zeroizes the pruned filters during training while updating them in the next training epoch. Thus the trained information of the pruned filter
Regularization has long been utilized to learn sparsity in deep neural network pruning. However, its role is mainly explored in the small penalty strength regime. In this work, we extend its application to a new scenario where the regularization grow