No Arabic abstract
Recent breakthroughs in recurrent deep neural networks with long short-term memory (LSTM) units has led to major advances in artificial intelligence. State-of-the-art LSTM models with significantly increased complexity and a large number of parameters, however, have a bottleneck in computing power resulting from limited memory capacity and data communication bandwidth. Here we demonstrate experimentally that LSTM can be implemented with a memristor crossbar, which has a small circuit footprint to store a large number of parameters and in-memory computing capability that circumvents the von Neumann bottleneck. We illustrate the capability of our system by solving real-world problems in regression and classification, which shows that memristor LSTM is a promising low-power and low-latency hardware platform for edge inference.
Brain-inspired computing and neuromorphic hardware are promising approaches that offer great potential to overcome limitations faced by current computing paradigms based on traditional von-Neumann architecture. In this regard, interest in developing memristor crossbar arrays has increased due to their ability to natively perform in-memory computing and fundamental synaptic operations required for neural network implementation. For optimal efficiency, crossbar-based circuits need to be compatible with fabrication processes and materials of industrial CMOS technologies. Herein, we report a complete CMOS-compatible fabrication process of TiO2-based passive memristor crossbars with 700 nm wide electrodes. We show successful bottom electrode fabrication by a damascene process, resulting in an optimised topography and a surface roughness as low as 1.1 nm. DC sweeps and voltage pulse programming yield statistical results related to synaptic-like multilevel switching. Both cycle-to-cycle and device-to-device variability are investigated. Analogue programming of the conductance using sequences of 200 ns voltage pulses suggest that the fabricated memories have a multilevel capacity of at least 3 bits due to the cycle-to-cycle reproducibility.
We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to Holographic Reduced Representations and Long Short-Term Memory networks. Holographic Reduced Representations have limited capacity: as they store more information, each retrieval becomes noisier due to interference. Our system in contrast creates redundant copies of stored information, which enables retrieval with reduced noise. Experiments demonstrate faster learning on multiple memorization tasks.
Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations. In this paper, we tackle this problem by utilizing state access patterns of StackLSTM to homogenize computations with regard to different discrete operations. Our parsing experiments show that the method scales up almost linearly with increasing batch size, and our parallelized PyTorch implementation trains significantly faster compared to the Dynet C++ implementation.
Associative memory using fast weights is a short-term memory mechanism that substantially improves the memory capacity and time scale of recurrent neural networks (RNNs). As recent studies introduced fast weights only to regular RNNs, it is unknown whether fast weight memory is beneficial to gated RNNs. In this work, we report a significant synergy between long short-term memory (LSTM) networks and fast weight associative memories. We show that this combination, in learning associative retrieval tasks, results in much faster training and lower test error, a performance boost most prominent at high memory task difficulties.
In this work we propose a neuromorphic hardware based signal equalizer by based on the deep learning implementation. The proposed neural equalizer is plasticity trainable equalizer which is different from traditional model designed based DFE. A trainable Long Short-Term memory neural network based DFE architecture is proposed for signal recovering and digital implementation is evaluated through FPGA implementation. Constructing with modelling based equalization methods, the proposed approach is compatible to multiple frequency signal equalization instead of single type signal equalization. We shows quantitatively that the neuronmorphic equalizer which is amenable both analog and digital implementation outperforms in different metrics in comparison with benchmarks approaches. The proposed method is adaptable both for general neuromorphic computing or ASIC instruments.