ترغب بنشر مسار تعليمي؟ اضغط هنا

Design and Characterization of Superconducting Nanowire-Based Processors for Acceleration of Deep Neural Network Training

142   0   0.0 ( 0 )
 نشر من قبل Murat Onen
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Training of deep neural networks (DNNs) is a computationally intensive task and requires massive volumes of data transfer. Performing these operations with the conventional von Neumann architectures creates unmanageable time and power costs. Recent studies have shown that mixed-signal designs involving crossbar architectures are capable of achieving acceleration factors as high as 30,000x over the state of the art digital processors. These approaches involve utilization of non-volatile memory (NVM) elements as local processors. However, no technology has been developed to-date that can satisfy the strict device requirements for the unit cell. This paper presents the superconducting nanowire-based processing element as a cross-point device. The unit cell has many programmable non-volatile states that can be used to perform analog multiplication. Importantly, these states are intrinsically discrete due to quantization of flux, which provides symmetric switching characteristics. Operation of these devices in a crossbar is described and verified with electro-thermal circuit simulations. Finally, validation of the concept in an actual DNN training task is shown using an emulator.



قيم البحث

اقرأ أيضاً

A resistive memory device-based computing architecture is one of the promising platforms for energy-efficient Deep Neural Network (DNN) training accelerators. The key technical challenge in realizing such accelerators is to accumulate the gradient in formation without a bias. Unlike the digital numbers in software which can be assigned and accessed with desired accuracy, numbers stored in resistive memory devices can only be manipulated following the physics of the device, which can significantly limit the training performance. Therefore, additional techniques and algorithm-level remedies are required to achieve the best possible performance in resistive memory device-based accelerators. In this paper, we analyze asymmetric conductance modulation characteristics in RRAM by Soft-bound synapse model and present an in-depth analysis on the relationship between device characteristics and DNN model accuracy using a 3-layer DNN trained on the MNIST dataset. We show that the imbalance between up and down update leads to a poor network performance. We introduce a concept of symmetry point and propose a zero-shifting technique which can compensate imbalance by programming the reference device and changing the zero value point of the weight. By using this zero-shifting method, we show that network performance dramatically improves for imbalanced synapse devices.
Uncertainty plays a key role in real-time machine learning. As a significant shift from standard deep networks, which does not consider any uncertainty formulation during its training or inference, Bayesian deep networks are being currently investiga ted where the network is envisaged as an ensemble of plausible models learnt by the Bayes formulation in response to uncertainties in sensory data. Bayesian deep networks consider each synaptic weight as a sample drawn from a probability distribution with learnt mean and variance. This paper elaborates on a hardware design that exploits cycle-to-cycle variability of oxide based Resistive Random Access Memories (RRAMs) as a means to realize such a probabilistic sampling function, instead of viewing it as a disadvantage.
In this work, a neural network based terramechanics model and terrain estimator are presented with an outlook for optimal control applications such as model predictive control. Recognizing the limitations of the state-of-the-art terramechanics models in terms of operating conditions, computational cost, and continuous differentiability for gradient-based optimization, an efficient and twice continuously differentiable terramechanics model is developed using neural networks for dynamic operations on deformable terrains. It is demonstrated that the neural network terramechanics model is able to predict the lateral tire forces accurately and efficiently compared to the Soil Contact Model as a state-of-the-art model. Furthermore, the neural network terramechanics model is implemented within a terrain estimator and it is shown that using this model the estimator converges within around 2% of the true terrain parameter. Finally, with model predictive control applications in mind, which typically rely on bicycle models for their predictions, it is demonstrated that utilizing the estimated terrain parameter can reduce prediction errors of a bicycle model by orders of magnitude. The result is an efficient, dynamic, twice continuously differentiable terramechanics model and estimator that has inherent advantages for implementation in model predictive control as compared to previously established models.
Ternary logic system is the most promising and pursued alternate to the prevailing binary logic systems due to the energy efficiency of circuits following reduced circuit complexity and chip area. In this paper, we have proposed a ternary 3-Transisto r Dynamic Random-Access Memory (3T-DRAM) cell using a single word-line for both read and write operation. For simulation of the circuit, we have used Carbon-Nano-Tube Field Effect Transistor (CNTFET). Here, we have analyzed the operation of the circuit considering different process variations and showed the results for write delay, read sensing time, and consumed current. Along with the basic DRAM design, we have proposed a ternary sense circuitry for the proper read operation of the proposed DRAM. The simulation and analysis are executed using the H-SPICE tool with Stanford University CNTFET model.
An analog synapse circuit based on ferroelectric-metal field-effect transistors is proposed, that offers 6-bit weight precision. The circuit is comprised of volatile least significant bits (LSBs) used solely during training, and non-volatile most sig nificant bits (MSBs) used for both training and inference. The design works at a 1.8V logic-compatible voltage, provides 10^10 endurance cycles, and requires only 250ps update pulses. A variant of LeNet trained with the proposed synapse achieves 98.2% accuracy on MNIST, which is only 0.4% lower than an ideal implementation of the same network with the same bit precision. Furthermore, the proposed synapse offers improvements of up to 26% in area, 44.8% in leakage power, 16.7% in LSB update pulse duration, and two orders of magnitude in endurance cycles, when compared to state-of-the-art hybrid synaptic circuits. Our proposed synapse can be extended to an 8-bit design, enabling a VGG-like network to achieve 88.8% accuracy on CIFAR-10 (only 0.8% lower than an ideal implementation of the same network).
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا