Do you want to publish a course? Click here

Stochastic Computing for Hardware Implementation of Binarized Neural Networks

163   0   0.0 ( 0 )
 Added by Damien Querlioz
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Binarized Neural Networks, a recently discovered class of neural networks with minimal memory requirements and no reliance on multiplication, are a fantastic opportunity for the realization of compact and energy efficient inference hardware. However, such neural networks are generally not entirely binarized: their first layer remains with fixed point input. In this work, we propose a stochastic computing version of Binarized Neural Networks, where the input is also binarized. Simulations on the example of the Fashion-MNIST and CIFAR-10 datasets show that such networks can approach the performance of conventional Binarized Neural Networks. We evidence that the training procedure should be adapted for use with stochastic computing. Finally, the ASIC implementation of our scheme is investigated, in a system that closely associates logic and memory, implemented by Spin Torque Magnetoresistive Random Access Memory. This analysis shows that the stochastic computing approach can allow considerable savings with regards to conventional Binarized Neural networks in terms of area (62% area reduction on the Fashion-MNIST task). It can also allow important savings in terms of energy consumption, if we accept reasonable reduction of accuracy: for example a factor 2.1 can be saved, with the cost of 1.4% in Fashion-MNIST test accuracy. These results highlight the high potential of Binarized Neural Networks for hardware implementation, and that adapting them to hardware constrains can provide important benefits.

rate research

Read More

RRAM-based in-Memory Computing is an exciting road for implementing highly energy efficient neural networks. This vision is however challenged by RRAM variability, as the efficient implementation of in-memory computing does not allow error correction. In this work, we fabricated and tested a differential HfO2-based memory structure and its associated sense circuitry, which are ideal for in-memory computing. For the first time, we show that our approach achieves the same reliability benefits as error correction, but without any CMOS overhead. We show, also for the first time, that it can naturally implement Binarized Deep Neural Networks, a very recent development of Artificial Intelligence, with extreme energy efficiency, and that the system is fully satisfactory for image recognition applications. Finally, we evidence how the extra reliability provided by the differential memory allows programming the devices in low voltage conditions, where they feature high endurance of billions of cycles.
The brain performs intelligent tasks with extremely low energy consumption. This work takes inspiration from two strategies used by the brain to achieve this energy efficiency: the absence of separation between computing and memory functions, and the reliance on low precision computation. The emergence of resistive memory technologies indeed provides an opportunity to co-integrate tightly logic and memory in hardware. In parallel, the recently proposed concept of Binarized Neural Network, where multiplications are replaced by exclusive NOR (XNOR) logic gates, offers a way to implement artificial intelligence using very low precision computation. In this work, we therefore propose a strategy to implement low energy Binarized Neural Networks, which employs brain-inspired concepts, while retaining energy benefits from digital electronics. We design, fabricate and test a memory array, including periphery and sensing circuits, optimized for this in-memory computing scheme. Our circuit employs hafnium oxide resistive memory integrated in the back end of line of a 130 nanometer CMOS process, in a two transistors - two resistors cell, which allows performing the exclusive NOR operations of the neural network directly within the sense amplifiers. We show, based on extensive electrical measurements, that our design allows reducing the amount of bit errors on the synaptic weights, without the use of formal error correcting codes. We design a whole system using this memory array. We show on standard machine learning tasks (MNIST, CIFAR-10, ImageNet and an ECG task) that the system has an inherent resilience to bit errors. We evidence that its energy consumption is attractive compared to more standard approaches, and that it can use the memory devices in regimes where they exhibit particularly low programming energy and high endurance.
Resistive random access memories (RRAM) are novel nonvolatile memory technologies, which can be embedded at the core of CMOS, and which could be ideal for the in-memory implementation of deep neural networks. A particularly exciting vision is using them for implementing Binarized Neural Networks (BNNs), a class of deep neural networks with a highly reduced memory footprint. The challenge of resistive memory, however, is that they are prone to device variation, which can lead to bit errors. In this work we show that BNNs can tolerate these bit errors to an outstanding level, through simulations of networks on the MNIST and CIFAR10 tasks. If a standard BNN is used, up to 10^-4 bit error rate can be tolerated with little impact on recognition performance on both MNIST and CIFAR10. We then show that by adapting the training procedure to the fact that the BNN will be operated on error-prone hardware, this tolerance can be extended to a bit error rate of 4x10^-2. The requirements for RRAM are therefore a lot less stringent for BNNs than more traditional applications. We show, based on experimental measurements on a RRAM HfO2 technology, that this result can allow reduce RRAM programming energy by a factor 30.
One of the most exciting applications of Spin Torque Magnetoresistive Random Access Memory (ST-MRAM) is the in-memory implementation of deep neural networks, which could allow improving the energy efficiency of Artificial Intelligence by orders of magnitude with regards to its implementation on computers and graphics cards. In particular, ST-MRAM could be ideal for implementing Binarized Neural Networks (BNNs), a type of deep neural networks discovered in 2016, which can achieve state-of-the-art performance with a highly reduced memory footprint with regards to conventional artificial intelligence approaches. The challenge of ST-MRAM, however, is that it is prone to write errors and usually requires the use of error correction. In this work, we show that these bit errors can be tolerated by BNNs to an outstanding level, based on examples of image recognition tasks (MNIST, CIFAR-10 and ImageNet): bit error rates of ST-MRAM up to 0.1% have little impact on recognition accuracy. The requirements for ST-MRAM are therefore considerably relaxed for BNNs with regards to traditional applications. By consequence, we show that for BNNs, ST-MRAMs can be programmed with weak (low-energy) programming conditions, without error correcting codes. We show that this result can allow the use of low energy and low area ST-MRAM cells, and show that the energy savings at the system level can reach a factor two.
Neuromorphic hardware platforms implement biological neurons and synapses to execute spiking neural networks (SNNs) in an energy-efficient manner. We present SpiNeMap, a design methodology to map SNNs to crossbar-based neuromorphic hardware, minimizing spike latency and energy consumption. SpiNeMap operates in two steps: SpiNeCluster and SpiNePlacer. SpiNeCluster is a heuristic-based clustering technique to partition SNNs into clusters of synapses, where intracluster local synapses are mapped within crossbars of the hardware and inter-cluster global synapses are mapped to the shared interconnect. SpiNeCluster minimizes the number of spikes on global synapses, which reduces spike congestion on the shared interconnect, improving application performance. SpiNePlacer then finds the best placement of local and global synapses on the hardware using a meta-heuristic-based approach to minimize energy consumption and spike latency. We evaluate SpiNeMap using synthetic and realistic SNNs on the DynapSE neuromorphic hardware. We show that SpiNeMap reduces average energy consumption by 45% and average spike latency by 21%, compared to state-of-the-art techniques.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا