ﻻ يوجد ملخص باللغة العربية
On-line Precision scalability of the deep neural networks(DNNs) is a critical feature to support accuracy and complexity trade-off during the DNN inference. In this paper, we propose dual-precision DNN that includes two different precision modes in a single model, thereby supporting an on-line precision switch without re-training. The proposed two-phase training process optimizes both low- and high-precision modes.
We present PrecisionBatching, a quantized inference algorithm for speeding up neural network execution on traditional hardware platforms at low bitwidths without the need for retraining or recalibration. PrecisionBatching decomposes a neural network
Low-precision deep neural network (DNN) training has gained tremendous attention as reducing precision is one of the most effective knobs for boosting DNNs training time/energy efficiency. In this paper, we attempt to explore low-precision training f
Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacks a systematic method to determine the exact quantiza
Recent advancements in the area of deep learning have shown the effectiveness of very large neural networks in several applications. However, as these deep neural networks continue to grow in size, it becomes more and more difficult to configure thei
While there exist a plethora of deep learning tools and frameworks, the fast-growing complexity of the field brings new demands and challenges, such as more flexible network design, speedy computation on distributed setting, and compatibility between