ترغب بنشر مسار تعليمي؟ اضغط هنا

Exploring the Connection Between Binary and Spiking Neural Networks

117   0   0.0 ( 0 )
 نشر من قبل Abhronil Sengupta
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

On-chip edge intelligence has necessitated the exploration of algorithmic techniques to reduce the compute requirements of current machine learning frameworks. This work aims to bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks - both of which are driven by the same motivation and yet synergies between the two have not been fully explored. We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on large-scale datasets like CIFAR-$100$ and ImageNet. An important implication of this work is that Binary Spiking Neural Networks can be enabled by In-Memory hardware accelerators catered for Binary Neural Networks without suffering any accuracy degradation due to binarization. We utilize standard training techniques for non-spiking networks to generate our spiking networks by conversion process and also perform an extensive empirical analysis and explore simple design-time and run-time optimization techniques for reducing inference latency of spiking networks (both for binary and full-precision models) by an order of magnitude over prior work.



قيم البحث

اقرأ أيضاً

Several recent works have shown separation results between deep neural networks, and hypothesis classes with inferior approximation capacity such as shallow networks or kernel classes. On the other hand, the fact that deep networks can efficiently ex press a target function does not mean that this target function can be learned efficiently by deep neural networks. In this work we study the intricate connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target. Specifically, we show that a necessary condition for a function to be learnable by gradient descent on deep neural networks is to be able to approximate the function, at least in a weak sense, with shallow neural networks. We also show that a class of functions can be learned by an efficient statistical query algorithm if and only if it can be approximated in a weak sense by some kernel class. We give several examples of functions which demonstrate depth separation, and conclude that they cannot be efficiently learned, even by a hypothesis class that can efficiently approximate them.
151 - Defang Chen , Can Wang , Yan Feng 2021
Knowledge distillation is a generalized logits matching technique for model compression. Their equivalence is previously established on the condition of $textit{infinity temperature}$ and $textit{zero-mean normalization}$. In this paper, we prove tha t with only $textit{infinity temperature}$, the effect of knowledge distillation equals to logits matching with an extra regularization. Furthermore, we reveal that an additional weaker condition -- $textit{equal-mean initialization}$ rather than the original $textit{zero-mean normalization}$ already suffices to set up the equivalence. The key to our proof is we realize that in modern neural networks with the cross-entropy loss and softmax activation, the mean of back-propagated gradient on logits always keeps zero.
Artificial Neural Network (ANN)-based inference on battery-powered devices can be made more energy-efficient by restricting the synaptic weights to be binary, hence eliminating the need to perform multiplications. An alternative, emerging, approach r elies on the use of Spiking Neural Networks (SNNs), biologically inspired, dynamic, event-driven models that enhance energy efficiency via the use of binary, sparse, activations. In this paper, an SNN model is introduced that combines the benefits of temporally sparse binary activations and of binary weights. Two learning rules are derived, the first based on the combination of straight-through and surrogate gradient techniques, and the second based on a Bayesian paradigm. Experiments validate the performance loss with respect to full-precision implementations, and demonstrate the advantage of the Bayesian paradigm in terms of accuracy and calibration.
We establish connections between the problem of learning a two-layer neural network and tensor decomposition. We consider a model with feature vectors $boldsymbol x in mathbb R^d$, $r$ hidden units with weights ${boldsymbol w_i}_{1le i le r}$ and out put $yin mathbb R$, i.e., $y=sum_{i=1}^r sigma( boldsymbol w_i^{mathsf T}boldsymbol x)$, with activation functions given by low-degree polynomials. In particular, if $sigma(x) = a_0+a_1x+a_3x^3$, we prove that no polynomial-time learning algorithm can outperform the trivial predictor that assigns to each example the response variable $mathbb E(y)$, when $d^{3/2}ll rll d^2$. Our conclusion holds for a `natural data distribution, namely standard Gaussian feature vectors $boldsymbol x$, and output distributed according to a two-layer neural network with random isotropic weights, and under a certain complexity-theoretic assumption on tensor decomposition. Roughly speaking, we assume that no polynomial-time algorithm can substantially outperform current methods for tensor decomposition based on the sum-of-squares hierarchy. We also prove generalizations of this statement for higher degree polynomial activations, and non-random weight vectors. Remarkably, several existing algorithms for learning two-layer networks with rigorous guarantees are based on tensor decomposition. Our results support the idea that this is indeed the core computational difficulty in learning such networks, under the stated generative model for the data. As a side result, we show that under this model learning the network requires accurate learning of its weights, a property that does not hold in a more general setting.
As neural networks get widespread adoption in resource-constrained embedded devices, there is a growing need for low-power neural systems. Spiking Neural Networks (SNNs)are emerging to be an energy-efficient alternative to the traditional Artificial Neural Networks (ANNs) which are known to be computationally intensive. From an application perspective, as federated learning involves multiple energy-constrained devices, there is a huge scope to leverage energy efficiency provided by SNNs. Despite its importance, there has been little attention on training SNNs on a large-scale distributed system like federated learning. In this paper, we bring SNNs to a more realistic federated learning scenario. Specifically, we propose a federated learning framework for decentralized and privacy-preserving training of SNNs. To validate the proposed federated learning framework, we experimentally evaluate the advantages of SNNs on various aspects of federated learning with CIFAR10 and CIFAR100 benchmarks. We observe that SNNs outperform ANNs in terms of overall accuracy by over 15% when the data is distributed across a large number of clients in the federation while providing up to5.3x energy efficiency. In addition to efficiency, we also analyze the sensitivity of the proposed federated SNN framework to data distribution among the clients, stragglers, and gradient noise and perform a comprehensive comparison with ANNs.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا