Application-driven Design Exploration for Dense Ferroelectric Embedded Non-volatile Memories

76 0 0.0 ( 0 )

Download Cite

Added by Mohammad Mehdi Sharifi

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Mohammad Mehdi Sharifi - Lillian Pentecost - Ramin Rajaei

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The memory wall bottleneck is a key challenge across many data-intensive applications. Multi-level FeFET-based embedded non-volatile memories are a promising solution for denser and more energy-efficient on-chip memory. However, reliable multi-level cell storage requires careful optimizations to minimize the design overhead costs. In this work, we investigate the interplay between FeFET device characteristics, programming schemes, and memory array architecture, and explore different design choices to optimize performance, energy, area, and accuracy metrics for critical data-intensive workloads. From our cross-stack design exploration, we find that we can store DNN weights and social network graphs at a density of over 8MB/mm^2 and sub-2ns read access latency without loss in application accuracy.

rate research

$alpha$-In$_2$Se$_3$ based Ferroelectric-Semiconductor Metal Junction for Non-Volatile Memories

167 - Atanu K. Saha , Mengwei Si , Peide Ye 2020

In this work, we theoretically and experimentally investigate the working principle and non-volatile memory (NVM) functionality of 2D $alpha$-In$_2$Se$_3$ based ferroelectric-semiconductor-metal-junction (FeSMJ). First, we analyze the semiconducting and ferroelectric properties of $alpha$-In$_2$Se$_3$ van-der-Waals (vdW) stack via experimental characterization and first-principle simulations. Then, we develop a FeSMJ device simulation framework by self-consistently solving Landau-Ginzburg-Devonshire (LGD) equation, Poissons equation, and charge-transport equations. Based on the extracted FeS parameters, our simulation results show good agreement with the experimental characteristics of our fabricated $alpha$-In$_2$Se$_3$ based FeSMJ. Our analysis suggests that the vdW gap between the metal and FeS plays a key role to provide FeS polarization-dependent modulation of Schottky barrier heights. Further, we show that the thickness scaling of FeS leads to a reduction in read/write voltage and an increase in distinguishability. Array-level analysis of FeSMJ NVM suggests a 5.47x increase in sense margin, 18.18x reduction in area and lower read-write power with respect to Fe insulator tunnel junction (FTJ).

Applied Physics Materials Science

Read Mapping Near Non-Volatile Memory

86 - S. Karen Khatamifard , Zamshed Chowdhury , Nakul Pande 2017

DNA sequencing is the physical/biochemical process of identifying the location of the four bases (Adenine, Guanine, Cytosine, Thymine) in a DNA strand. As semiconductor technology revolutionized computing, modern DNA sequencing technology (termed Next Generation Sequencing, NGS)revolutionized genomic research. As a result, modern NGS platforms can sequence hundreds of millions of short DNA fragments in parallel. The sequenced DNA fragments, representing the output of NGS platforms, are termed reads. Besides genomic variations, NGS imperfections induce noise in reads. Mapping each read to (the most similar portion of) a reference genome of the same species, i.e., read mapping, is a common critical first step in a diverse set of emerging bioinformatics applications. Mapping represents a search-heavy memory-intensive similarity matching problem, therefore, can greatly benefit from near-memory processing. Intuition suggests using fast associative search enabled by Ternary Content Addressable Memory (TCAM) by construction. However, the excessive energy consumption and lack of support for similarity matching (under NGS and genomic variation induced noise) renders direct application of TCAM infeasible, irrespective of volatility, where only non-volatile TCAM can accommodate the large memory footprint in an area-efficient way. This paper introduces GeNVoM, a scalable, energy-efficient and high-throughput solution. Instead of optimizing an algorithm developed for general-purpose computers or GPUs, GeNVoM rethinks the algorithm and non-volatile TCAM-based accelerator design together from the ground up. Thereby GeNVoM can improve the throughput by up to 113.5 times (3.6); the energy consumption, by up to 210.9 times (1.36), when compared to a GPU (accelerator) baseline, which represents one of the highest-throughput implementations known.

Distributed Parallel and Cluster Computing Hardware Architecture

A Flat-Combining-Based Persistent Stack for Non-Volatile Memory

78 - Matan Rusanovsky , Ohad Ben-Baruch , Danny Hendler 2020

Flat combining (FC) is a synchronization paradigm in which a single thread, holding a global lock, collects requests by multiple threads for accessing a concurrent data structure and applies their combined requests to it. Although FC is sequential, it significantly reduces synchronization overheads and cache invalidations and thus often provides better performance than that of lock-free implementations. The recent emergence of non-volatile memory (NVM) technologies increases the interest in the development of persistent (a.k.a. durable or recoverable) objects. These are objects that are able to recover from system failures and ensure consistency by retaining their state in NVM and fixing it, if required, upon recovery. Of particular interest are detectable objects that, in addition to ensuring consistency, allow recovery code to infer if a failed operation took effect before the crash and, if it did, obtain its response. In this work, we present the first FC-based persistent object. Specifically, we introduce a detectable FC-based implementation of a concurrent LIFO stack object. Our empirical evaluation establishes that thanks to the usage of flat combining, the novel stack algorithm requires a much smaller number of costly persistence instructions than competing algorithms and is therefore able to significantly outperform them.

Distributed Parallel and Cluster Computing Operating Systems

Non-volatile ferroelectric memory effect in ultrathin {alpha}-In2Se3

72 - Siyuan Wan , Yue Li , Wei Li 2018

Recent experiments on layered {alpha}-In2Se3 have confirmed its room-temperature ferroelectricity under ambient condition. This observation renders {alpha}-In2Se3 an excellent platform for developing two-dimensional (2D) layered-material based electronics with nonvolatile functionality. In this letter, we demonstrate non-volatile memory effect in a hybrid 2D ferroelectric field effect transistor (FeFET) made of ultrathin {alpha}-In2Se3 and graphene. The resistance of graphene channel in the FeFET is tunable and retentive due to the electrostatic doping, which stems from the electric polarization of the ferroelectric {alpha}-In2Se3. The electronic logic bit can be represented and stored with different orientations of electric dipoles in the top-gate ferroelectric. The 2D FeFET can be randomly re-written over more than $10^5$ cycles without losing the non-volatility. Our approach demonstrates a protype of re-writable non-volatile memory with ferroelectricity in van de Waals 2D materials.

Materials Science

Comparative Design Space Exploration of Dense and Semi-Dense SLAM

65 - M. Zeeshan Zia , Luigi Nardi , Andrew Jack 2015

SLAM has matured significantly over the past few years, and is beginning to appear in serious commercial products. While new SLAM systems are being proposed at every conference, evaluation is often restricted to qualitative visualizations or accuracy estimation against a ground truth. This is due to the lack of benchmarking methodologies which can holistically and quantitatively evaluate these systems. Further investigation at the level of individual kernels and parameter spaces of SLAM pipelines is non-existent, which is absolutely essential for systems research and integration. We extend the recently introduced SLAMBench framework to allow comparing two state-of-the-art SLAM pipelines, namely KinectFusion and LSD-SLAM, along the metrics of accuracy, energy consumption, and processing frame rate on two different hardware platforms, namely a desktop and an embedded device. We also analyze the pipelines at the level of individual kernels and explore their algorithmic and hardware design spaces for the first time, yielding valuable insights.

Robotics Computer Vision and Pattern Recognition