ReCU: Reviving the Dead Weights in Binary Neural Networks

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zihan Xu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zihan Xu - Mingbao Lin - Jianzhuang Liu

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Binary neural networks (BNNs) have received increasing attention due to their superior reductions of computation and memory. Most existing works focus on either lessening the quantization error by minimizing the gap between the full-precision weights and their binarization or designing a gradient approximation to mitigate the gradient mismatch, while leaving the dead weights untouched. This leads to slow convergence when training BNNs. In this paper, for the first time, we explore the influence of dead weights which refer to a group of weights that are barely updated during the training of BNNs, and then introduce rectified clamp unit (ReCU) to revive the dead weights for updating. We prove that reviving the dead weights by ReCU can result in a smaller quantization error. Besides, we also take into account the information entropy of the weights, and then mathematically analyze why the weight standardization can benefit BNNs. We demonstrate the inherent contradiction between minimizing the quantization error and maximizing the information entropy, and then propose an adaptive exponential scheduler to identify the range of the dead weights. By considering the dead weights, our method offers not only faster BNN training, but also state-of-the-art performance on CIFAR-10 and ImageNet, compared with recent methods. Code can be available at https://github.com/z-hXu/ReCU.

قيم البحث

121 - Hyeryung Jang , Nicolas Skatchkovsky , Osvaldo Simeone 2020

Artificial Neural Network (ANN)-based inference on battery-powered devices can be made more energy-efficient by restricting the synaptic weights to be binary, hence eliminating the need to perform multiplications. An alternative, emerging, approach r elies on the use of Spiking Neural Networks (SNNs), biologically inspired, dynamic, event-driven models that enhance energy efficiency via the use of binary, sparse, activations. In this paper, an SNN model is introduced that combines the benefits of temporally sparse binary activations and of binary weights. Two learning rules are derived, the first based on the combination of straight-through and surrogate gradient techniques, and the second based on a Bayesian paradigm. Experiments validate the performance loss with respect to full-precision implementations, and demonstrate the advantage of the Bayesian paradigm in terms of accuracy and calibration.

التعلم الآلي الحوسبة العصبية والتطورية معالجة الإشارات

Training Competitive Binary Neural Networks from Scratch

87 - Joseph Bethge , Marvin Bornstein , Adrian Loy 2018

Convolutional neural networks have achieved astonishing results in different application areas. Various methods that allow us to use these models on mobile and embedded devices have been proposed. Especially binary neural networks are a promising app roach for devices with low computational power. However, training accurate binary models from scratch remains a challenge. Previous work often uses prior knowledge from full-precision models and complex training strategies. In our work, we focus on increasing the performance of binary neural networks without such prior knowledge and a much simpler training strategy. In our experiments we show that we are able to achieve state-of-the-art results on standard benchmark datasets. Further, to the best of our knowledge, we are the first to successfully adopt a network architecture with dense connections for binary networks, which lets us improve the state-of-the-art even further.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy?

56 - Joseph Bethge , Christian Bartz , Haojin Yang 2020

Binary Neural Networks (BNNs) are neural networks which use binary weights and activations instead of the typical 32-bit floating point values. They have reduced model sizes and allow for efficient inference on mobile or embedded devices with limited power and computational resources. However, the binarization of weights and activations leads to feature maps of lower quality and lower capacity and thus a drop in accuracy compared to traditional networks. Previous work has increased the number of channels or used multiple binary bases to alleviate these problems. In this paper, we instead present an architectural approach: MeliusNet. It consists of alternating a DenseBlock, which increases the feature capacity, and our proposed ImprovementBlock, which increases the feature quality. Experiments on the ImageNet dataset demonstrate the superior performance of our MeliusNet over a variety of popular binary architectures with regards to both computation savings and accuracy. Furthermore, with our method we trained BNN models, which for the first time can match the accuracy of the popular compact network MobileNet-v1 in terms of model size, number of operations and accuracy. Our code is published online at https://github.com/hpi-xnor/BMXNet-v2

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

BNN - BN = ?: Training Binary Neural Networks without Batch Normalization

151 - Tianlong Chen , Zhenyu Zhang , Xu Ouyang 2021

Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN). However, the BN layer is costly to calculate and is typically implemented with non-binary parameters, leaving a hurdle for the e fficient implementation of BNN training. It also introduces undesirable dependence between samples within each batch. Inspired by the latest advance on Batch Normalization Free (BN-Free) training, we extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes. By plugging in and customizing techniques including adaptive gradient clipping, scale weight standardization, and specialized bottleneck block, a BN-free BNN is capable of maintaining competitive accuracy compared to its BN-based counterpart. Extensive experiments validate the effectiveness of our proposal across diverse BNN backbones and datasets. For example, after removing BNs from the state-of-the-art ReActNets, it can still be trained with our proposed methodology to achieve 92.08%, 68.34%, and 68.0% accuracy on CIFAR-10, CIFAR-100, and ImageNet respectively, with marginal performance drop (0.23%~0.44% on CIFAR and 1.40% on ImageNet). Codes and pre-trained models are available at: https://github.com/VITA-Group/BNN_NoBN.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

Do Neural Network Weights account for Classes Centers?

562 - Ioannis Kansizoglou , Loukas Bampis , 2021

The exploitation of Deep Neural Networks (DNNs) as descriptors in feature learning challenges enjoys apparent popularity over the past few years. The above tendency focuses on the development of effective loss functions that ensure both high feature discrimination among different classes, as well as low geodesic distance between the feature vectors of a given class. The vast majority of the contemporary works rely their formulation on an empirical assumption about the feature space of a networks last hidden layer, claiming that the weight vector of a class accounts for its geometrical center in the studied space. The paper at hand follows a theoretical approach and indicates that the aforementioned hypothesis is not exclusively met. This fact raises stability issues regarding the training procedure of a DNN, as shown in our experimental study. Consequently, a specific symmetry is proposed and studied both analytically and empirically that satisfies the above assumption, addressing the established convergence issues.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط