No Arabic abstract
As one of the most promising emerging non-volatile memory (NVM) technologies, spin-transfer torque magnetic random access memory (STT-MRAM) has attracted significant research attention due to several features such as high density, zero standby leakage, and nearly unlimited endurance. However, a high-quality test solution is required prior to the commercialization of STT-MRAM. In this paper, we present all STT-MRAM failure mechanisms: manufacturing defects, extreme process variations, magnetic coupling, STT-switching stochasticity, and thermal fluctuation. The resultant fault models including permanent faults and transient faults are classified and discussed. Moreover, the limited test algorithms and design-for-testability (DfT) designs proposed in the literature are also covered. It is clear that test solutions for STT-MRAMs are far from well established yet, especially when considering a defective part per billion (DPPB) level requirement. We present the main challenges on the STT-MRAM testing topic at three levels: failure mechanisms, fault modeling, and test/DfT designs.
As a unique mechanism for MRAMs, magnetic coupling needs to be accounted for when designing memory arrays. This paper models both intra- and inter-cell magnetic coupling analytically for STT-MRAMs and investigates their impact on the write performance and retention of MTJ devices, which are the data-storing elements of STT-MRAMs. We present magnetic measurement data of MTJ devices with diameters ranging from 35nm to 175nm, which we use to calibrate our intra-cell magnetic coupling model. Subsequently, we extrapolate this model to study inter-cell magnetic coupling in memory arrays. We propose the inter-cell magnetic coupling factor Psi to indicate coupling strength. Our simulation results show that Psi=2% maximizes the array density under the constraint that the magnetic coupling has negligible impact on the devices performance. Higher array densities show significant variations in average switching time, especially at low switching voltages, caused by inter-cell magnetic coupling, and dependent on the data pattern in the cells neighborhood. We also observe a marginal degradation of the data retention time under the influence of inter-cell magnetic coupling.
In this work, we propose FUSE, a novel GPU cache system that integrates spin-transfer torque magnetic random-access memory (STT-MRAM) into the on-chip L1D cache. FUSE can minimize the number of outgoing memory accesses over the interconnection network of GPUs multiprocessors, which in turn can considerably improve the level of massive computing parallelism in GPUs. Specifically, FUSE predicts a read-level of GPU memory accesses by extracting GPU runtime information and places write-once-read-multiple (WORM) data blocks into the STT-MRAM, while accommodating write-multiple data blocks over a small portion of SRAM in the L1D cache. To further reduce the off-chip memory accesses, FUSE also allows WORM data blocks to be allocated anywhere in the STT-MRAM by approximating the associativity with the limited number of tag comparators and I/O peripherals. Our evaluation results show that, in comparison to a traditional GPU cache, our proposed heterogeneous cache reduces the number of outgoing memory references by 32% across the interconnection network, thereby improving the overall performance by 217% and reducing energy cost by 53%.
The realistic modeling of STT-MRAM for the simulations of hybrid CMOS/Spintronics devices in comprehensive simulation environments require a full description of stochastic switching processes in state of the art STT-MRAM. Here, we derive an analytical formulation that takes into account the spin-torque asymmetry of the spin polarization function of magnetic tunnel junctions studying. We studied its validity range by comparing the analytical formulas with results achieved numerically within a full micromagnetic framework. We also find that a reasonable fit of the probability density function (PDF) of the switching time is given by a Pearson Type IV PDF. The main results of this work underlines the need of data-driven design of STT-MRAM that uses a full micromagnetic simulation framework for the statistical proprieties PDF of switching processes.
Spin Transfer Torque MRAMs are attractive due to their non-volatility, high density and zero leakage. However, STT-MRAMs suffer from poor reliability due to shared read and write paths. Additionally, conflicting requirements for data retention and write-ability (both related to the energy barrier height of the magnet) makes design more challenging. Furthermore, the energy barrier height depends on the physical dimensions of the free layer. Any variations in the dimensions of the free layer lead to variations in the energy barrier height. In order to address poor reliability of STT-MRAMs, usage of Error Correcting Codes (ECC) have been proposed. Unlike traditional CMOS memory technologies, ECC is expected to correct both soft and hard errors in STT_MRAMs. To achieve acceptable yield with low write power, stronger ECC is required, resulting in increased number of encoded bits and degraded memory efficiency. In this paper, we propose Failure aware ECC (FaECC), which masks permanent faults while maintaining the same correction capability for soft errors without increased encoded bits. Furthermore, we investigate the impact of process variations on run-time reliability of STT-MRAMs. We provide an analysis on the impact of process variations on the life-time of the free layer and retention failures. In order to analyze the effectiveness of our methodology, we developed a cross-layer simulation framework that consists of device, circuit and array level analysis of STT-MRAM memory arrays. Our results show that using FaECC relaxes the requirements on the energy barrier height, which reduces the write energy and results in smaller access transistor size and memory array area. Keywords: STT-MRAM, reliability, Error Correcting Codes, ECC, magnetic memory
We extensively test a recent protocol to demonstrate quantum fault tolerance on three systems: (1) a real-time simulation of five spin qubits coupled to an environment with two-level defects, (2) a real-time simulation of transmon quantum computers, and (3) the 16-qubit processor of the IBM Q Experience. In the simulations, the dynamics of the full system is obtained by numerically solving the time-dependent Schrodinger equation. We find that the fault-tolerant scheme provides a systematic way to improve the results when the errors are dominated by the inherent control and measurement errors present in transmon systems. However, the scheme fails to satisfy the criterion for fault tolerance when decoherence effects are dominant.