أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Shurui Li

Revisiting the double-well problem by deep learning with a hybrid network

123 - Shurui Li , Jianqin Xu , Jing Qian 2021

Solving physical problems by deep learning is accurate and efficient mainly accounting for the use of an elaborate neural network. We propose a novel hybrid network which integrates two different kinds of neural networks: LSTM and ResNet, in order to overcome the difficulty met in solving strongly-oscillating dynamics of the systems time evolution. By taking the double-well model as an example we show that our new method can benefit from a pre-learning and verification of the periodicity of frequency by using the LSTM network, simultaneously making a high-fidelity prediction about the whole dynamics of system with ResNet, which is impossibly achieved in the case of single network. Such a hybrid network can be applied for solving cooperative dynamics in a system with fast spatial or temporal modulations, promising for realistic oscillation calculations under experimental conditions.

الفيزياء الحسابية التعلم الآلي فيزياء الكم

SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

132 - Shurui Li , Wojciech Romaszkan , Alexander Graening 2021

Quantization is spearheading the increase in performance and efficiency of neural network computing systems making headway into commodity hardware. We present SWIS - Shared Weight bIt Sparsity, a quantization framework for efficient neural network in ference acceleration delivering improved performance and storage compression through an offline weight decomposition and scheduling algorithm. SWIS can achieve up to 54.3% (19.8%) point accuracy improvement compared to weight truncation when quantizing MobileNet-v2 to 4 (2) bits post-training (with retraining) showing the strength of leveraging shared bit-sparsity in weights. SWIS accelerator gives up to 6x speedup and 1.9x energy improvement overstate of the art bit-serial architectures.

التعلم الآلي

Channel Tiling for Improved Performance and Accuracy of Optical Neural Network Accelerators

94 - Shurui Li , Mario Miscuglio , Volker J. Sorger 2020

Low latency, high throughput inference on Convolution Neural Networks (CNNs) remains a challenge, especially for applications requiring large input or large kernel sizes. 4F optics provides a solution to accelerate CNNs by converting convolutions int o Fourier-domain point-wise multiplications that are computationally free in optical domain. However, existing 4F CNN systems suffer from the all-positive sensor readout issue which makes the implementation of a multi-channel, multi-layer CNN not scalable or even impractical. In this paper we propose a simple channel tiling scheme for 4F CNN systems that utilizes the high resolution of 4F system to perform channel summation inherently in optical domain before sensor detection, so the outputs of different channels can be correctly accumulated. Compared to state of the art, channel tiling gives similar accuracy, significantly better robustness to sensing quantization (33% improvement in required sensing precision) error and noise (10dB reduction in tolerable sensing noise), 0.5X total filters required, 10-50X+ throughput improvement and as much as 3X reduction in required output camera resolution/bandwidth. Not requiring any additional optical hardware, the proposed channel tiling approach addresses an important throughput and precision bottleneck of high-speed, massively-parallel optical 4F computing systems.

التقنيات الناشئة هندسة العتاد

Massively Parallel Amplitude-Only Fourier Neural Network

75 - Mario Miscuglio , Zibo Hu , Shurui Li 2020

Machine-intelligence has become a driving factor in modern society. However, its demand outpaces the underlying electronic technology due to limitations given by fundamental physics such as capacitive charging of wires, but also by system architectur e of storing and handling data, both driving recent trends towards processor heterogeneity. Here we introduce a novel amplitude-only Fourier-optical processor paradigm capable of processing large-scale ~(1,000 x 1,000) matrices in a single time-step and 100 microsecond-short latency. Conceptually, the information-flow direction is orthogonal to the two-dimensional programmable-network, which leverages 10^6-parallel channels of display technology, and enables a prototype demonstration performing convolutions as pixel-wise multiplications in the Fourier domain reaching peta operations per second throughputs. The required real-to-Fourier domain transformations are performed passively by optical lenses at zero-static power. We exemplary realize a convolutional neural network (CNN) performing classification tasks on 2-Megapixel large matrices at 10 kHz rates, which latency-outperforms current GPU and phase-based display technology by one and two orders of magnitude, respectively. Training this optical convolutional layer on image classification tasks and utilizing it in a hybrid optical-electronic CNN, shows classification accuracy of 98% (MNIST) and 54% (CIFAR-10). Interestingly, the amplitude-only CNN is inherently robust against coherence noise in contrast to phase-based paradigms and features an over 2 orders of magnitude lower delay than liquid crystal-based systems. Beyond contributing to novel accelerator technology, scientifically this amplitude-only massively-parallel optical compute-paradigm can be far-reaching as it de-validates the assumption that phase-information outweighs amplitude in optical processors for machine-intelligence.

معالجة الصور والفيديو بصريات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد