ترغب بنشر مسار تعليمي؟ اضغط هنا

`In-memory computing is being widely explored as a novel computing paradigm to mitigate the well known memory bottleneck. This emerging paradigm aims at embedding some aspects of computations inside the memory array, thereby avoiding frequent and exp ensive movement of data between the compute unit and the storage memory. In-memory computing with respect to Silicon memories has been widely explored on various memory bit-cells. Embedding computation inside the 6 transistor (6T) SRAM array is of special interest since it is the most widely used on-chip memory. In this paper, we present a novel in-memory multiplication followed by accumulation operation capable of performing parallel dot products within 6T SRAM without any changes to the standard bitcell. We, further, study the effect of circuit non-idealities and process variations on the accuracy of the LeNet-5 and VGG neural network architectures against the MNIST and CIFAR-10 datasets, respectively. The proposed in-memory dot-product mechanism achieves 88.8% and 99% accuracy for the CIFAR-10 and MNIST, respectively. Compared to the standard von Neumann system, the proposed system is 6.24x better in energy consumption and 9.42x better in delay.
Deep neural networks are a biologically-inspired class of algorithms that have recently demonstrated state-of-the-art accuracies involving large-scale classification and recognition tasks. Indeed, a major landmark that enables efficient hardware acce lerators for deep networks is the recent advances from the machine learning community that have demonstrated aggressively scaled deep binary networks with state-of-the-art accuracies. In this paper, we demonstrate how deep binary networks can be accelerated in modified von-Neumann machines by enabling binary convolutions within the SRAM array. In general, binary convolutions consist of bit-wise XNOR followed by a population-count (popcount). We present a charge sharing XNOR and popcount operation in 10 transistor SRAM cells. We have employed multiple circuit techniques including dual-read-worldines (Dual-RWL) along with a dual-stage ADC that overcomes the inaccuracies of a low precision ADC, to achieve a fairly accurate popcount. In addition, a key highlight of the present work is the fact that we propose sectioning of the SRAM array by adding switches onto the read-bitlines, thereby achieving improved parallelism. This is beneficial for deep networks, where the kernel size grows and requires to be stored in multiple sub-banks. As such, one needs to evaluate the partial popcount from multiple sub-banks and sum them up for achieving the final popcount. For n-sections per sub-array, we can perform n convolutions within one particular sub-bank, thereby improving overall system throughput as well as the energy efficiency. Our results at the array level show that the energy consumption and delay per-operation was 1.914pJ and 45ns, respectively. Moreover, an energy improvement of 2.5x, and a performance improvement of 4x was achieved by using the proposed sectioned-SRAM, compared to a non-sectioned SRAM design.
We propose capacitively driven low-swing global interconnect circuit using a receiver that utilizes magnetoelectric (ME) effect induced magnetization switching to reduce the energy consumption. Capacitively driven wire has recently been shown to be e ffective in improving the performance of global interconnects. Such techniques can reduce the signal swing in the interconnect by using a capacitive divider network and does not require an additional voltage supply. However, the large reduction in signal swing makes it necessary to use differential signaling and amplification for successful regeneration at the receiver, which add area and static power. ME effect induced magnetization reversal has recently been proposed which shows the possibility of using a low voltage to switch a nanomagnet adjacent to a multi-ferroic oxide. Here, we propose an ME effect based receiver that uses the low voltage at the receiving end of the global wire to switch a nanomagnet. The nanomagnet is also used as the free layer of a magnetic tunnel junction (MTJ), the resistance of which is tuned through the ME effect. This change in MTJ resistance is converted to full swing binary signals by using simple digital components. This process allows capacitive low swing interconnection without differential signaling or amplification, which leads to significant energy efficiency. Our simulation results indicate that for 5-10 mm long global wires in IBM 45 nm technology, capacitive ME design consumes 3x lower energy compared to full-swing CMOS design and 2x lower energy compared to differential amplifier based low-swing capacitive CMOS design.
Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic have been the workhorse of the state-of-art computing platforms. Despite tremendous strides in scaling the ubiquitous metal-oxide-semiconductor transistor, the underlying te xtit{von-Neumann} computing architecture has remained unchanged. The limited throughput and energy-efficiency of the state-of-art computing systems, to a large extent, results from the well-known textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the von-Neumann machines have been accentuated in recent times due to the present emphasis on data-intensive applications like artificial intelligence, machine learning textit{etc}. A possible approach towards mitigating the overhead associated with the von-Neumann bottleneck is to enable textit{in-memory} Boolean computations. In this manuscript, we present an augmented version of the conventional SRAM bit-cells, called textit{the X-SRAM}, with the ability to perform in-memory, vector Boolean computations, in addition to the usual memory storage operations. We propose at least six different schemes for enabling in-memory vector computations including NAND, NOR, IMP (implication), XOR logic gates with respect to different bit-cell topologies $-$ the 8T cell and the 8$^+$T Differential cell. In addition, we also present a novel textit{`read-compute-store} scheme, wherein the computed Boolean function can be directly stored in the memory without the need of latching the data and carrying out a subsequent write operation. The feasibility of the proposed schemes has been verified using predictive transistor models and Monte-Carlo variation analysis.
Stochastic spiking neural networks based on nanoelectronic spin devices can be a possible pathway to achieving brainlike compact and energy-effcient cognitive intelligence. The computational model attempt to exploit the intrinsic device stochasticity of nanoelectronic synaptic or neural components to perform learning or inference. However, there has been limited analysis on the scaling effect of stochastic spin devices and its impact on the operation of such stochastic networks at the system level. This work attempts to explore the design space and analyze the performance of nanomagnet-based stochastic neuromorphic computing architectures for magnets with different barrier heights. We illustrate how the underlying network architecture must be modified to account for the random telegraphic switching behavior displayed by magnets with low barrier heights as they are scaled into the superparamagnetic regime. We perform a device-to-system-level analysis on a deep neural-network architecture for a digit-recognition problem on the MNIST data set.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا