Do you want to publish a course? Click here

ERSFQ 8-bit Parallel Arithmetic Logic Unit

87   0   0.0 ( 0 )
 Added by Igor Vernik
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

We have designed and tested a parallel 8-bit ERSFQ arithmetic logic unit (ALU). The ALU design employs wave-pipelined instruction execution and features modular bit-slice architecture that is easily extendable to any number of bits and adaptable to current recycling. A carry signal synchronized with an asynchronous instruction propagation provides the wave-pipeline operation of the ALU. The ALU instruction set consists of 14 arithmetical and logical instructions. It has been designed and simulated for operation up to a 10 GHz clock rate at the 10-kA/cm2 fabrication process. The ALU is embedded into a shift-register-based high-frequency testbed with on-chip clock generator to allow for comprehensive high frequency testing for all possible operands. The 8-bit ERSFQ ALU, comprising 6840 Josephson junctions, has been fabricated with MIT Lincoln Lab 10-kA/cm2 SFQ5ee fabrication process featuring eight Nb wiring layers and a high-kinetic inductance layer needed for ERSFQ technology. We evaluated the bias margins for all instructions and various operands at both low and high frequency clock. At low frequency, clock and all instruction propagation through ALU were observed with bias margins of +/-11% and +/-9%, respectively. Also at low speed, the ALU exhibited correct functionality for all arithmetical and logical instructions with +/-6% bias margins. We tested the 8-bit ALU for all instructions up to 2.8 GHz clock frequency.



rate research

Read More

We have designed and tested a parallel 8-bit ERSFQ binary shifter that is one of the essential circuits in the design of the energy-efficient superconducting CPU. The binary shifter performs a bi-directional SHIFT instruction of an 8-bit argument. It consists of a bi-direction triple-port shift register controlled by two (left and right) shift pulse generators asynchronously generating a set number of shift pulses. At first clock cycle, an 8-bit word is loaded into the binary shifter and a 3-bit shift argument is loaded into the desired shift-pulse generator. Next, the generator produces the required number of shift SFQ pulses (from 0 to 7) asynchronously, with a repetition rate set by the internal generator delay of ~ 30 ps. These SFQ pulses are applied to the left (positive) or the right (negative) input of the binary shifter. Finally, after the shift operation is completed, the resulting 8-bit word goes to the parallel output. The complete 8-bit ERSFQ binary shifter, consisting of 820 Josephson junctions, was simulated and optimized using PSCAN2. It was fabricated in MIT Lincoln Lab 10-kA/cm2 SFQ5ee fabrication process with a high-kinetic inductance layer. We have successfully tested the binary shifter at both the LSB-to-MSB and MSB-to-LSB propagation regimes for all eight shift arguments. A single shift operation on a single input word demonstrated operational margins of +/-16% of the dc bias current. The correct functionality of the 8-bit ERSFQ binary shifter with the large, exhaustive data pattern was observed within +/-10% margins of the dc bias current. In this paper, we describe the design and present the test results for the ERSFQ 8-bit parallel binary shifter.
The rapid development of Artificial Intelligence (AI) and Internet of Things (IoT) increases the requirement for edge computing with low power and relatively high processing speed devices. The Computing-In-Memory(CIM) schemes based on emerging resistive Non-Volatile Memory(NVM) show great potential in reducing the power consumption for AI computing. However, the device inconsistency of the non-volatile memory may significantly degenerate the performance of the neural network. In this paper, we propose a low power Resistive RAM (RRAM) based CIM core to not only achieve high computing efficiency but also greatly enhance the robustness by bit line regulator and bit line weight mapping algorithm. The simulation results show that the power consumption of our proposed 8-bit CIM core is only 3.61mW (256*256). The SFDR and SNDR of the CIM core achieve 59.13 dB and 46.13 dB, respectively. The proposed bit line weight mapping scheme improves the top-1 accuracy by 2.46% and 3.47% for AlexNet and VGG16 on ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC 2012) in 8-bit mode, respectively.
Due to their very nature, Spin Waves (SWs) created in the same waveguide, but with different frequencies, can coexist while selectively interacting with their own species only. The absence of inter-frequency interferences isolates input data sets encoded in SWs with different frequencies and creates the premises for simultaneous data parallel SW based processing without hardware replication or delay overhead. In this paper we leverage this SW property by introducing a novel computation paradigm, which allows for the parallel processing of n-bit input data vectors on the same basic SW based logic gate. Subsequently, to demonstrate the proposed concept, we present 8-bit parallel 3-input Majority gate implementation and validate it by means of Object Oriented MicroMagnetic Framework (OOMMF) simulations. To evaluate the potential benefit of our proposal we compare the 8-bit data parallel gate with equivalent scalar SW gate based implementation. Our evaluation indicates that 8-bit data 3-input Majority gate implementation requires 4.16x less area than the scalar SW gate based equivalent counterpart while preserving the same delay and energy consumption figures.
As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Multi-symbol errors arising due to faults in multiple data buses and chips may not be detected by these schemes. In this paper, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD) - a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). SSCMSD can also enhance the capability of detecting errors in address bits. We employ 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a x4 based DDRx system. Our simulations show that the proposed scheme effectively prevents SDCs in the presence of multiple symbol errors. Our novel design enabled us to achieve this without introducing additional READ latency. Also, we need 19 chips per rank (storage overhead of 18.75 percent), 76 data bus-lines and additional hash-logic at the memory controller.
103 - B. Cheon , E. Lee , L.-T. Wang 2007
This paper describes a flexible logic BIST scheme that features high fault coverage achieved by fault-simulation guided test point insertion, real at-speed test capability for multi-clock designs without clock frequency manipulation, and easy physical implementation due to the use of a low-speed SE signal. Application results of this scheme to two widely used IP cores are also reported.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا