Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Cost Modeling and Projection for Stacked Nanowire Fabric

57 0 0.0 ( 0 )

Download Cite

Added by Naveen Kumar Macha

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Naveen Kumar Macha - Mostafizur Rahman

Emerging Technologies Hardware Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

To continue scaling beyond 2-D CMOS with 3-D integration, any new 3-D IC technology has to be comparable or better than 2-D CMOS in terms of scalability, enhanced functionality, density, power, performance, cost, and reliability. Transistor-level 3-D integration carries the most potential in this regard. Recently, we proposed a stacked horizontal nanowire based transistor-level 3-D integration approach, called SN3D [1][2] that solves scaling challenges and achieves tremendous benefits with respect to 2-D CMOS while keeping manageable thermal profile. In this paper, we present the cost analysis of SN3D and show comparison with 2-D CMOS (2D), conventional TSV based 3-D (T3D) and Monolithic 3-D integrations (M3D). In our cost model, we capture the implications of manufacturing, circuit density, interconnects, bonding and heat in determining die cost, and evaluate how cost scales as transistor count increases. Since SN3D is a new 3-D IC fabric, based on our proposed manufacturing pathway[1] we assumed complexity of fabrication steps as proportionality constants in our cost estimation model. Our analysis revealed 86%, 72% and 74% reduction in area; 55%, 43% and 43% reduction in interconnects distribution and total interconnect length for SN3D, which largely contributed to 70%, 67% and 68% reduction in cost in comparison to 2D, T3D and M3D respectively.

rate research

Channel Tiling for Improved Performance and Accuracy of Optical Neural Network Accelerators

94 - Shurui Li , Mario Miscuglio , Volker J. Sorger 2020

Low latency, high throughput inference on Convolution Neural Networks (CNNs) remains a challenge, especially for applications requiring large input or large kernel sizes. 4F optics provides a solution to accelerate CNNs by converting convolutions into Fourier-domain point-wise multiplications that are computationally free in optical domain. However, existing 4F CNN systems suffer from the all-positive sensor readout issue which makes the implementation of a multi-channel, multi-layer CNN not scalable or even impractical. In this paper we propose a simple channel tiling scheme for 4F CNN systems that utilizes the high resolution of 4F system to perform channel summation inherently in optical domain before sensor detection, so the outputs of different channels can be correctly accumulated. Compared to state of the art, channel tiling gives similar accuracy, significantly better robustness to sensing quantization (33% improvement in required sensing precision) error and noise (10dB reduction in tolerable sensing noise), 0.5X total filters required, 10-50X+ throughput improvement and as much as 3X reduction in required output camera resolution/bandwidth. Not requiring any additional optical hardware, the proposed channel tiling approach addresses an important throughput and precision bottleneck of high-speed, massively-parallel optical 4F computing systems.

Emerging Technologies Hardware Architecture

Memcomputing for Accelerated Optimization

227 - John Aiken , Fabio L. Traversa 2020

In this work, we introduce the concept of an entirely new circuit architecture based on the novel, physics-inspired computing paradigm: Memcomputing. In particular, we focus on digital memcomputing machines (DMMs) that can be designed leveraging properties of non-linear dynamical systems; ultimate descriptors of electronic circuits. The working principle of these systems relies on the ability of currents and voltages of the circuit to self-organize in order to satisfy mathematical relations. In particular for this work, we discuss self-organizing gates, namely Self-Organizing Algebraic Gates (SOAGs), aimed to solve linear inequalities and therefore used to solve optimization problems in Integer Linear Programming (ILP) format. Unlike conventional IO gates, SOAGs are terminal-agnostic, meaning each terminal handles a superposition of input and output signals. When appropriately assembled to represent a given ILP problem, the corresponding self-organizing circuit converges to the equilibria that express the solutions to the problem at hand. Because DMMs components are non-quantum, the ordinary differential equations describing it can be efficiently simulated on our modern computers in software, as well as be built in hardware with off-of-the-shelf technology. As an example, we show the performance of this novel approach implemented as Software as a Service (MemCPU XPC) to address an ILP problem. Compared to todays best solution found using a world renowned commercial solver, MemCPU XPC brings the time to solution down from 23 hours to less than 2 minutes.

Emerging Technologies Hardware Architecture Adaptation and Self-Organizing Systems

Field-Programmable Crossbar Array (FPCA) for Reconfigurable Computing

103 - Mohammed A. Zidan , YeonJoo Jeong , Jong Hong Shin 2016

For decades, advances in electronics were directly driven by the scaling of CMOS transistors according to Moores law. However, both the CMOS scaling and the classical computer architecture are approaching fundamental and practical limits, and new computing architectures based on emerging devices, such as resistive random-access memory (RRAM) devices, are expected to sustain the exponential growth of computing capability. Here we propose a novel memory-centric, reconfigurable, general purpose computing platform that is capable of handling the explosive amount of data in a fast and energy-efficient manner. The proposed computing architecture is based on a uniform, physical, resistive, memory-centric fabric that can be optimally reconfigured and utilized to perform different computing and data storage tasks in a massively parallel approach. The system can be tailored to achieve maximal energy efficiency based on the data flow by dynamically allocating the basic computing fabric for storage, arithmetic, and analog computing including neuromorphic computing tasks.

Emerging Technologies Hardware Architecture Neural and Evolutionary Computing

A Machine Learning Accelerator In-Memory for Energy Harvesting

122 - Salonik Resch , S. Karen Khatamifard , Zamshed Iqbal Chowdhury 2019

There is increasing demand to bring machine learning capabilities to low power devices. By integrating the computational power of machine learning with the deployment capabilities of low power devices, a number of new applications become possible. In some applications, such devices will not even have a battery, and must rely solely on energy harvesting techniques. This puts extreme constraints on the hardware, which must be energy efficient and capable of tolerating interruptions due to power outages. Here, as a representative example, we propose an in-memory support vector machine learning accelerator utilizing non-volatile spintronic memory. The combination of processing-in-memory and non-volatility provides a key advantage in that progress is effectively saved after every operation. This enables instant shut down and restart capabilities with minimal overhead. Additionally, the operations are highly energy efficient leading to low power consumption.

Emerging Technologies Hardware Architecture Distributed Parallel and Cluster Computing

A Fast Method for Steady-State Memristor Crossbar Array Circuit Simulation

271 - Rui Xie , Mingyang Song , Junzhuo Zhou 2021

In this work we propose an effective preconditioning technique to accelerate the steady-state simulation of large-scale memristor crossbar arrays (MCAs). We exploit the structural regularity of MCAs to develop a specially-crafted preconditioner that can be efficiently evaluated utilizing tensor products and block matrix inversion. Numerical experiments demonstrate the efficacy of the proposed technique compared to mainstream preconditioners.

Emerging Technologies Hardware Architecture

comments

Fetching comments

Cordoba Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Cost Modeling and Projection for Stacked Nanowire Fabric

Ask ChatGPT about the research

No Arabic abstract

Read More