No Arabic abstract
Synaptic Sampling Machine (SSM) is a type of neural network model that considers biological unreliability of the synapses. We propose the circuit design of the SSM neural network which is realized through the memristive-CMOS crossbar structure with the synaptic sampling cell (SSC) being used as a basic stochastic unit. The increase in the edge computing devices in the Internet of things era, drives the need for hardware acceleration for data processing and computing. The computational considerations of the processing speed and possibility for the real-time realization pushes the synaptic sampling algorithm that demonstrated promising results on software for hardware implementation.
Brain-inspired neuromorphic computing which consist neurons and synapses, with an ability to perform complex information processing has unfolded a new paradigm of computing to overcome the von Neumann bottleneck. Electronic synaptic memristor devices which can compete with the biological synapses are indeed significant for neuromorphic computing. In this work, we demonstrate our efforts to develop and realize the graphene oxide (GO) based memristor device as a synaptic device, which mimic as a biological synapse. Indeed, this device exhibits the essential synaptic learning behavior including analog memory characteristics, potentiation and depression. Furthermore, spike-timing-dependent-plasticity learning rule is mimicked by engineering the pre- and post-synaptic spikes. In addition, non-volatile properties such as endurance, retentivity, multilevel switching of the device are explored. These results suggest that Ag/GO/FTO memristor device would indeed be a potential candidate for future neuromorphic computing applications. Keywords: RRAM, Graphene oxide, neuromorphic computing, synaptic device, potentiation, depression
Memristors have recently received significant attention as ubiquitous device-level components for building a novel generation of computing systems. These devices have many promising features, such as non-volatility, low power consumption, high density, and excellent scalability. The ability to control and modify biasing voltages at the two terminals of memristors make them promising candidates to perform matrix-vector multiplications and solve systems of linear equations. In this article, we discuss how networks of memristors arranged in crossbar arrays can be used for efficiently solving optimization and machine learning problems. We introduce a new memristor-based optimization framework that combines the computational merit of memristor crossbars with the advantages of an operator splitting method, alternating direction method of multipliers (ADMM). Here, ADMM helps in splitting a complex optimization problem into subproblems that involve the solution of systems of linear equations. The capability of this framework is shown by applying it to linear programming, quadratic programming, and sparse optimization. In addition to ADMM, implementation of a customized power iteration (PI) method for eigenvalue/eigenvector computation using memristor crossbars is discussed. The memristor-based PI method can further be applied to principal component analysis (PCA). The use of memristor crossbars yields a significant speed-up in computation, and thus, we believe, has the potential to advance optimization and machine learning research in artificial intelligence (AI).
We present new computational building blocks based on memristive devices. These blocks, can be used to implement either supervised or unsupervised learning modules. This is achieved using a crosspoint architecture which is an efficient array implementation for nanoscale two-terminal memristive devices. Based on these blocks and an experimentally verified SPICE macromodel for the memristor, we demonstrate that firstly, the Spike-Timing-Dependent Plasticity (STDP) can be implemented by a single memristor device and secondly, a memristor-based competitive Hebbian learning through STDP using a $1times 1000$ synaptic network. This is achieved by adjusting the memristors conductance values (weights) as a function of the timing difference between presynaptic and postsynaptic spikes. These implementations have a number of shortcomings due to the memristors characteristics such as memory decay, highly nonlinear switching behaviour as a function of applied voltage/current, and functional uniformity. These shortcomings can be addressed by utilising a mixed gates that can be used in conjunction with the analogue behaviour for biomimetic computation. The digital implementations in this paper use in-situ computational capability of the memristor.
In this reply, we will provide our impersonal, point-to-point responses to the major criticisms (in bold and underlined) in arXiv:1909.12464. Firstly, we will identify a number of (imperceptibly hidden) mistakes in the Comment in understanding/interpreting our physical model. Secondly, we will use a 3rd-party experiment carried out in 1961 (plus other 3rd-party experiments thereafter) to further support our claim that our invented Phi memristor is memristive in spite of the existence of a parasitic inductor effect. Thirdly, we will analyse this parasitic effect mathematically, introduce our work-in-progress (in nanoscale) and point out that this parasitic inductor effect should not become a big worry since it can be completely removed in the macro-scale devices and safely neglected in the nano-scale devices.
Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications. We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossbars with general purpose execution units to enable the acceleration of a wide variety of Machine Learning (ML) inference workloads. PUMAs microarchitecture techniques exposed through a specialized Instruction Set Architecture (ISA) retain the efficiency of in-memory computing and analog circuitry, without compromising programmability. We also present the PUMA compiler which translates high-level code to PUMA ISA. The compiler partitions the computational graph and optimizes instruction scheduling and register allocation to generate code for large and complex workloads to run on thousands of spatial cores. We have developed a detailed architecture simulator that incorporates the functionality, timing, and power models of PUMAs components to evaluate performance and energy consumption. A PUMA accelerator running at 1 GHz can reach area and power efficiency of $577~GOPS/s/mm^2$ and $837~GOPS/s/W$, respectively. Our evaluation of diverse ML applications from image recognition, machine translation, and language modelling (5M-800M synapses) shows that PUMA achieves up to $2,446times$ energy and $66times$ latency improvement for inference compared to state-of-the-art GPUs. Compared to an application-specific memristor-based accelerator, PUMA incurs small energy overheads at similar inference latency and added programmability.