ترغب بنشر مسار تعليمي؟ اضغط هنا

Acceleration Method for Learning Fine-Layered Optical Neural Networks

189   0   0.0 ( 0 )
 نشر من قبل Kazuo Aoyama
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

An optical neural network (ONN) is a promising system due to its high-speed and low-power operation. Its linear unit performs a multiplication of an input vector and a weight matrix in optical analog circuits. Among them, a circuit with a multiple-layered structure of programmable Mach-Zehnder interferometers (MZIs) can realize a specific class of unitary matrices with a limited number of MZIs as its weight matrix. The circuit is effective for balancing the number of programmable MZIs and ONN performance. However, it takes a lot of time to learn MZI parameters of the circuit with a conventional automatic differentiation (AD), which machine learning platforms are equipped with. To solve the time-consuming problem, we propose an acceleration method for learning MZI parameters. We create customized complex-valued derivatives for an MZI, exploiting Wirtinger derivatives and a chain rule. They are incorporated into our newly developed function module implemented in C++ to collectively calculate their values in a multi-layered structure. Our method is simple, fast, and versatile as well as compatible with the conventional AD. We demonstrate that our method works 20 times faster than the conventional AD when a pixel-by-pixel MNIST task is performed in a complex-valued recurrent neural network with an MZI-based hidden unit.



قيم البحث

اقرأ أيضاً

86 - Hengyue Pan , Hui Jiang , Xin Niu 2018
The past few years have witnessed the fast development of different regularization methods for deep learning models such as fully-connected deep neural networks (DNNs) and Convolutional Neural Networks (CNNs). Most of previous methods mainly consider to drop features from input data and hidden layers, such as Dropout, Cutout and DropBlocks. DropConnect select to drop connections between fully-connected layers. By randomly discard some features or connections, the above mentioned methods control the overfitting problem and improve the performance of neural networks. In this paper, we proposed two novel regularization methods, namely DropFilter and DropFilter-PLUS, for the learning of CNNs. Different from the previous methods, DropFilter and DropFilter-PLUS selects to modify the convolution filters. For DropFilter-PLUS, we find a suitable way to accelerate the learning process based on theoretical analysis. Experimental results on MNIST show that using DropFilter and DropFilter-PLUS may improve performance on image classification tasks.
137 - Tianle Cai , Ruiqi Gao , Jikai Hou 2019
First-order methods such as stochastic gradient descent (SGD) are currently the standard algorithm for training deep neural networks. Second-order methods, despite their better convergence rate, are rarely used in practice due to the prohibitive comp utational cost in calculating the second-order information. In this paper, we propose a novel Gram-Gauss-Newton (GGN) algorithm to train deep neural networks for regression problems with square loss. Our method draws inspiration from the connection between neural network optimization and kernel regression of neural tangent kernel (NTK). Different from typical second-order methods that have heavy computational cost in each iteration, GGN only has minor overhead compared to first-order methods such as SGD. We also give theoretical results to show that for sufficiently wide neural networks, the convergence rate of GGN is emph{quadratic}. Furthermore, we provide convergence guarantee for mini-batch GGN algorithm, which is, to our knowledge, the first convergence result for the mini-batch version of a second-order method on overparameterized neural networks. Preliminary experiments on regression tasks demonstrate that for training standard networks, our GGN algorithm converges much faster and achieves better performance than SGD.
128 - Xu Liu , Xiaoya Zhang , Wei Peng 2021
Physics-informed neural networks (PINNs) have been widely used to solve various scientific computing problems. However, large training costs limit PINNs for some real-time applications. Although some works have been proposed to improve the training e fficiency of PINNs, few consider the influence of initialization. To this end, we propose a New Reptile initialization based Physics-Informed Neural Network (NRPINN). The original Reptile algorithm is a meta-learning initialization method based on labeled data. PINNs can be trained with less labeled data or even without any labeled data by adding partial differential equations (PDEs) as a penalty term into the loss function. Inspired by this idea, we propose the new Reptile initialization to sample more tasks from the parameterized PDEs and adapt the penalty term of the loss. The new Reptile initialization can acquire initialization parameters from related tasks by supervised, unsupervised, and semi-supervised learning. Then, PINNs with initialization parameters can efficiently solve PDEs. Besides, the new Reptile initialization can also be used for the variants of PINNs. Finally, we demonstrate and verify the NRPINN considering both forward problems, including solving Poisson, Burgers, and Schrodinger equations, as well as inverse problems, where unknown parameters in the PDEs are estimated. Experimental results show that the NRPINN training is much faster and achieves higher accuracy than PINNs with other initialization methods.
Information transfer rates in optical communications may be dramatically increased by making use of spatially non-Gaussian states of light. Here we demonstrate the ability of deep neural networks to classify numerically-generated, noisy Laguerre-Gaus s modes of up to 100 quanta of orbital angular momentum with near-unity fidelity. The scheme relies only on the intensity profile of the detected modes, allowing for considerable simplification of current measurement schemes required to sort the states containing increasing degrees of orbital angular momentum. We also present results that show the strength of deep neural networks in the classification of experimental superpositions of Laguerre-Gauss modes when the networks are trained solely using simulated images. It is anticipated that these results will allow for an enhancement of current optical communications technologies.
We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck. Unlike the neural tangent kernel limit, a bottleneck in an otherwise infinite width network al-lows data dependent feature learning in its bottle-nec k representation. We empirically show that a single bottleneck in infinite networks dramatically accelerates training when compared to purely in-finite networks, with an improved overall performance. We discuss the acceleration phenomena by drawing similarities to infinitely wide deep linear models, where the acceleration effect of a bottleneck can be understood theoretically.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا