Do you want to publish a course? Click here

Cost Modeling and Projection for Stacked Nanowire Fabric

57   0   0.0 ( 0 )
 Added by Naveen Kumar Macha
 Publication date 2017
and research's language is English




Ask ChatGPT about the research

To continue scaling beyond 2-D CMOS with 3-D integration, any new 3-D IC technology has to be comparable or better than 2-D CMOS in terms of scalability, enhanced functionality, density, power, performance, cost, and reliability. Transistor-level 3-D integration carries the most potential in this regard. Recently, we proposed a stacked horizontal nanowire based transistor-level 3-D integration approach, called SN3D [1][2] that solves scaling challenges and achieves tremendous benefits with respect to 2-D CMOS while keeping manageable thermal profile. In this paper, we present the cost analysis of SN3D and show comparison with 2-D CMOS (2D), conventional TSV based 3-D (T3D) and Monolithic 3-D integrations (M3D). In our cost model, we capture the implications of manufacturing, circuit density, interconnects, bonding and heat in determining die cost, and evaluate how cost scales as transistor count increases. Since SN3D is a new 3-D IC fabric, based on our proposed manufacturing pathway[1] we assumed complexity of fabrication steps as proportionality constants in our cost estimation model. Our analysis revealed 86%, 72% and 74% reduction in area; 55%, 43% and 43% reduction in interconnects distribution and total interconnect length for SN3D, which largely contributed to 70%, 67% and 68% reduction in cost in comparison to 2D, T3D and M3D respectively.



rate research

Read More

Low latency, high throughput inference on Convolution Neural Networks (CNNs) remains a challenge, especially for applications requiring large input or large kernel sizes. 4F optics provides a solution to accelerate CNNs by converting convolutions into Fourier-domain point-wise multiplications that are computationally free in optical domain. However, existing 4F CNN systems suffer from the all-positive sensor readout issue which makes the implementation of a multi-channel, multi-layer CNN not scalable or even impractical. In this paper we propose a simple channel tiling scheme for 4F CNN systems that utilizes the high resolution of 4F system to perform channel summation inherently in optical domain before sensor detection, so the outputs of different channels can be correctly accumulated. Compared to state of the art, channel tiling gives similar accuracy, significantly better robustness to sensing quantization (33% improvement in required sensing precision) error and noise (10dB reduction in tolerable sensing noise), 0.5X total filters required, 10-50X+ throughput improvement and as much as 3X reduction in required output camera resolution/bandwidth. Not requiring any additional optical hardware, the proposed channel tiling approach addresses an important throughput and precision bottleneck of high-speed, massively-parallel optical 4F computing systems.
In this work, we introduce the concept of an entirely new circuit architecture based on the novel, physics-inspired computing paradigm: Memcomputing. In particular, we focus on digital memcomputing machines (DMMs) that can be designed leveraging properties of non-linear dynamical systems; ultimate descriptors of electronic circuits. The working principle of these systems relies on the ability of currents and voltages of the circuit to self-organize in order to satisfy mathematical relations. In particular for this work, we discuss self-organizing gates, namely Self-Organizing Algebraic Gates (SOAGs), aimed to solve linear inequalities and therefore used to solve optimization problems in Integer Linear Programming (ILP) format. Unlike conventional IO gates, SOAGs are terminal-agnostic, meaning each terminal handles a superposition of input and output signals. When appropriately assembled to represent a given ILP problem, the corresponding self-organizing circuit converges to the equilibria that express the solutions to the problem at hand. Because DMMs components are non-quantum, the ordinary differential equations describing it can be efficiently simulated on our modern computers in software, as well as be built in hardware with off-of-the-shelf technology. As an example, we show the performance of this novel approach implemented as Software as a Service (MemCPU XPC) to address an ILP problem. Compared to todays best solution found using a world renowned commercial solver, MemCPU XPC brings the time to solution down from 23 hours to less than 2 minutes.
For decades, advances in electronics were directly driven by the scaling of CMOS transistors according to Moores law. However, both the CMOS scaling and the classical computer architecture are approaching fundamental and practical limits, and new computing architectures based on emerging devices, such as resistive random-access memory (RRAM) devices, are expected to sustain the exponential growth of computing capability. Here we propose a novel memory-centric, reconfigurable, general purpose computing platform that is capable of handling the explosive amount of data in a fast and energy-efficient manner. The proposed computing architecture is based on a uniform, physical, resistive, memory-centric fabric that can be optimally reconfigured and utilized to perform different computing and data storage tasks in a massively parallel approach. The system can be tailored to achieve maximal energy efficiency based on the data flow by dynamically allocating the basic computing fabric for storage, arithmetic, and analog computing including neuromorphic computing tasks.
There is increasing demand to bring machine learning capabilities to low power devices. By integrating the computational power of machine learning with the deployment capabilities of low power devices, a number of new applications become possible. In some applications, such devices will not even have a battery, and must rely solely on energy harvesting techniques. This puts extreme constraints on the hardware, which must be energy efficient and capable of tolerating interruptions due to power outages. Here, as a representative example, we propose an in-memory support vector machine learning accelerator utilizing non-volatile spintronic memory. The combination of processing-in-memory and non-volatility provides a key advantage in that progress is effectively saved after every operation. This enables instant shut down and restart capabilities with minimal overhead. Additionally, the operations are highly energy efficient leading to low power consumption.
In this work we propose an effective preconditioning technique to accelerate the steady-state simulation of large-scale memristor crossbar arrays (MCAs). We exploit the structural regularity of MCAs to develop a specially-crafted preconditioner that can be efficiently evaluated utilizing tensor products and block matrix inversion. Numerical experiments demonstrate the efficacy of the proposed technique compared to mainstream preconditioners.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا