No Arabic abstract
Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic have been the workhorse of the state-of-art computing platforms. Despite tremendous strides in scaling the ubiquitous metal-oxide-semiconductor transistor, the underlying textit{von-Neumann} computing architecture has remained unchanged. The limited throughput and energy-efficiency of the state-of-art computing systems, to a large extent, results from the well-known textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the von-Neumann machines have been accentuated in recent times due to the present emphasis on data-intensive applications like artificial intelligence, machine learning textit{etc}. A possible approach towards mitigating the overhead associated with the von-Neumann bottleneck is to enable textit{in-memory} Boolean computations. In this manuscript, we present an augmented version of the conventional SRAM bit-cells, called textit{the X-SRAM}, with the ability to perform in-memory, vector Boolean computations, in addition to the usual memory storage operations. We propose at least six different schemes for enabling in-memory vector computations including NAND, NOR, IMP (implication), XOR logic gates with respect to different bit-cell topologies $-$ the 8T cell and the 8$^+$T Differential cell. In addition, we also present a novel textit{`read-compute-store} scheme, wherein the computed Boolean function can be directly stored in the memory without the need of latching the data and carrying out a subsequent write operation. The feasibility of the proposed schemes has been verified using predictive transistor models and Monte-Carlo variation analysis.
`In-memory computing is being widely explored as a novel computing paradigm to mitigate the well known memory bottleneck. This emerging paradigm aims at embedding some aspects of computations inside the memory array, thereby avoiding frequent and expensive movement of data between the compute unit and the storage memory. In-memory computing with respect to Silicon memories has been widely explored on various memory bit-cells. Embedding computation inside the 6 transistor (6T) SRAM array is of special interest since it is the most widely used on-chip memory. In this paper, we present a novel in-memory multiplication followed by accumulation operation capable of performing parallel dot products within 6T SRAM without any changes to the standard bitcell. We, further, study the effect of circuit non-idealities and process variations on the accuracy of the LeNet-5 and VGG neural network architectures against the MNIST and CIFAR-10 datasets, respectively. The proposed in-memory dot-product mechanism achieves 88.8% and 99% accuracy for the CIFAR-10 and MNIST, respectively. Compared to the standard von Neumann system, the proposed system is 6.24x better in energy consumption and 9.42x better in delay.
Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast and low-energy hardware support for accurate NN search is highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate NN search for few-shot learning tasks by implementing $L_infty$ and Hamming distance metrics, but they cannot achieve software-comparable accuracies. This paper proposes a novel distance function that can be natively evaluated with multi-bit content-addressable memories (MCAMs) based on ferroelectric FETs (FeFETs) to perform a single-step, in-memory NN search. Moreover, this approach achieves accuracies comparable to floating-point precision implementations in software for NN classification and one/few-shot learning tasks. As an example, the proposed method achieves a 98.34% accuracy for a 5-way, 5-shot classification task for the Omniglot dataset (only 0.8% lower than software-based implementations) with a 3-bit MCAM. This represents a 13% accuracy improvement over state-of-the-art TCAM-based implementations at iso-energy and iso-delay. The presented distance function is resilient to the effects of FeFET device-to-device variations. Furthermore, this work experimentally demonstrates a 2-bit implementation of FeFET MCAM using AND arrays from GLOBALFOUNDRIES to further validate proof of concept.
Brain-inspired computing and neuromorphic hardware are promising approaches that offer great potential to overcome limitations faced by current computing paradigms based on traditional von-Neumann architecture. In this regard, interest in developing memristor crossbar arrays has increased due to their ability to natively perform in-memory computing and fundamental synaptic operations required for neural network implementation. For optimal efficiency, crossbar-based circuits need to be compatible with fabrication processes and materials of industrial CMOS technologies. Herein, we report a complete CMOS-compatible fabrication process of TiO2-based passive memristor crossbars with 700 nm wide electrodes. We show successful bottom electrode fabrication by a damascene process, resulting in an optimised topography and a surface roughness as low as 1.1 nm. DC sweeps and voltage pulse programming yield statistical results related to synaptic-like multilevel switching. Both cycle-to-cycle and device-to-device variability are investigated. Analogue programming of the conductance using sequences of 200 ns voltage pulses suggest that the fabricated memories have a multilevel capacity of at least 3 bits due to the cycle-to-cycle reproducibility.
Compute-in-memory (CiM) is a promising approach to alleviating the memory wall problem for domain-specific applications. Compared to current-domain CiM solutions, charge-domain CiM shows the opportunity for higher energy efficiency and resistance to device variations. However, the area occupation and standby leakage power of existing SRAMbased charge-domain CiM (CD-CiM) are high. This paper proposes the first concept and analysis of CD-CiM using nonvolatile memory (NVM) devices. The design implementation and performance evaluation are based on a proposed 2-transistor-1-capacitor (2T1C) CiM macro using ferroelectric field-effect-transistors (FeFETs), which is free from leakage power and much denser than the SRAM solution. With the supply voltage between 0.45V and 0.90V, operating frequency between 100MHz to 1.0GHz, binary neural network application simulations show over 47%, 60%, and 64% energy consumption reduction from existing SRAM-based CD-CiM, SRAM-based current-domain CiM, and RRAM-based current-domain CiM, respectively. For classifications in MNIST and CIFAR-10 data sets, the proposed FeFETbased CD-CiM achieves an accuracy over 95% and 80%, respectively.
Real-space mapping of doping concentration in semiconductor devices is of great importance for the microelectronic industry. In this work, a scanning microwave impedance microscope (MIM) is employed to resolve the local conductivity distribution of a static random access memory (SRAM) sample. The MIM electronics can also be adjusted to the scanning capacitance microscopy (SCM) mode, allowing both measurements on the same region. Interestingly, while the conventional SCM images match the nominal device structure, the MIM results display certain unexpected features, which originate from a thin layer of the dopant ions penetrating through the protective layers during the heavy implantation steps.