No Arabic abstract
Recently, integrated optics has gained interest as a hardware platform for implementing machine learning algorithms. Of particular interest are artificial neural networks, since matrix-vector multi- plications, which are used heavily in artificial neural networks, can be done efficiently in photonic circuits. The training of an artificial neural network is a crucial step in its application. However, currently on the integrated photonics platform there is no efficient protocol for the training of these networks. In this work, we introduce a method that enables highly efficient, in situ training of a photonic neural network. We use adjoint variable methods to derive the photonic analogue of the backpropagation algorithm, which is the standard method for computing gradients of conventional neural networks. We further show how these gradients may be obtained exactly by performing intensity measurements within the device. As an application, we demonstrate the training of a numerically simulated photonic artificial neural network. Beyond the training of photonic machine learning implementations, our method may also be of broad interest to experimental sensitivity analysis of photonic systems and the optimization of reconfigurable optics platforms.
Backpropagation through nonlinear neurons is an outstanding challenge to the field of optical neural networks and the major conceptual barrier to all-optical training schemes. Each neuron is required to exhibit a directionally dependent response to propagating optical signals, with the backwards response conditioned on the forward signal, which is highly non-trivial to implement optically. We propose a practical and surprisingly simple solution that uses saturable absorption to provide the network nonlinearity. We find that the backward propagating gradients required to train the network can be approximated in a pump-probe scheme that requires only passive optical elements. Simulations show that, with readily obtainable optical depths, our approach can achieve equivalent performance to state-of-the-art computational networks on image classification benchmarks, even in deep networks with multiple sequential gradient approximations. This scheme is compatible with leading optical neural network proposals and therefore provides a feasible path towards end-to-end optical training.
Deep spiking neural networks (SNNs) hold great potential for improving the latency and energy efficiency of deep neural networks through event-based computation. However, training such networks is difficult due to the non-differentiable nature of asynchronous spike events. In this paper, we introduce a novel technique, which treats the membrane potentials of spiking neurons as differentiable signals, where discontinuities at spike times are only considered as noise. This enables an error backpropagation mechanism for deep SNNs, which works directly on spike signals and membrane potentials. Thus, compared with previous methods relying on indirect training and conversion, our technique has the potential to capture the statics of spikes more precisely. Our novel framework outperforms all previously reported results for SNNs on the permutation invariant MNIST benchmark, as well as the N-MNIST benchmark recorded with event-based vision sensors.
Spiking neural networks (SNNs) well support spatiotemporal learning and energy-efficient event-driven hardware neuromorphic processors. As an important class of SNNs, recurrent spiking neural networks (RSNNs) possess great computational power. However, the practical application of RSNNs is severely limited by challenges in training. Biologically-inspired unsupervised learning has limited capability in boosting the performance of RSNNs. On the other hand, existing backpropagation (BP) methods suffer from high complexity of unrolling in time, vanishing and exploding gradients, and approximate differentiation of discontinuous spiking activities when applied to RSNNs. To enable supervised training of RSNNs under a well-defined loss function, we present a novel Spike-Train level RSNNs Backpropagation (ST-RSBP) algorithm for training deep RSNNs. The proposed ST-RSBP directly computes the gradient of a rated-coded loss function defined at the output layer of the network w.r.t tunable parameters. The scalability of ST-RSBP is achieved by the proposed spike-train level computation during which temporal effects of the SNN is captured in both the forward and backward pass of BP. Our ST-RSBP algorithm can be broadly applied to RSNNs with a single recurrent layer or deep RSNNs with multiple feed-forward and recurrent layers. Based upon challenging speech and image datasets including TI46, N-TIDIGITS, Fashion-MNIST and MNIST, ST-RSBP is able to train RSNNs with an accuracy surpassing that of the current state-of-art SNN BP algorithms and conventional non-spiking deep learning models.
The backpropagation algorithm has long been the canonical training method for neural networks. Modern paradigms are implicitly optimized for it, and numerous guidelines exist to ensure its proper use. Recently, synthetic gradients methods -where the error gradient is only roughly approximated - have garnered interest. These methods not only better portray how biological brains are learning, but also open new computational possibilities, such as updating layers asynchronously. Even so, they have failed to scale past simple tasks like MNIST or CIFAR-10. This is in part due to a lack of standards, leading to ill-suited models and practices forbidding such methods from performing to the best of their abilities. In this work, we focus on direct feedback alignment and present a set of best practices justified by observations of the alignment angles. We characterize a bottleneck effect that prevents alignment in narrow layers, and hypothesize it may explain why feedback alignment methods have yet to scale to large convolutional networks.
Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension aimed at training discrete deep neural networks is also investigated.