ﻻ يوجد ملخص باللغة العربية
The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications. There are two kinds of quantization methods, training-based quantization and post-training quantization. Training-based approach suffers from a cumbersome training process, while post-training quantization may lead to unacceptable accuracy drop. In this paper, we present an efficient and simple post-training method via scale optimization, named EasyQuant (EQ),that could obtain comparable accuracy with the training-based method.Specifically, we first alternately optimize scales of weights and activations for all layers target at convolutional outputs to further obtain the high quantization precision. Then, we lower down bit width to INT7 both for weights and activations, and adopt INT16 intermediate storage and integer Winograd convolution implementation to accelerate inference.Experimental results on various computer vision tasks show that EQ outperforms the TensorRT method and can achieve near INT8 accuracy in 7 bits width post-training.
Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order to overcome
Recently, transformer has achieved remarkable performance on a variety of computer vision applications. Compared with mainstream convolutional neural networks, vision transformers are often of sophisticated architectures for extracting powerful featu
Quantization is a key technique to reduce the resource requirement and improve the performance of neural network deployment. However, different hardware backends such as x86 CPU, NVIDIA GPU, ARM CPU, and accelerators may demand different implementati
While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from this limitation. To alleviate this limitation, in this paper, we leverage the syn
Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In