ﻻ يوجد ملخص باللغة العربية
Artificial Neural Networks (ANNs) are known as state-of-the-art techniques in Machine Learning (ML) and have achieved outstanding results in data-intensive applications, such as recognition, classification, and segmentation. These networks mostly use deep layers of convolution or fully connected layers with many filters in each layer, demanding a large amount of data and tunable hyperparameters to achieve competitive accuracy. As a result, storage, communication, and computational costs of training (in particular training time) become limiting factors to scale them up. In this paper, we propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations which allow us to skip updating weights when the variation is minuscule. During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting; however, we selectively skip updating weights (and their time-consuming computations). Such a training approach virtually achieves the same accuracy with considerably less computational cost, thus lower training time. We propose two methods for updating weights and evaluate them by analyzing four state-of-the-art models, AlexNet, VGG-11, VGG-16, ResNet-18 on CIFAR datasets. On average, our two proposed methods called WUS and WUS+LR reduced the training time (compared to the baseline) by 54%, and 50%, respectively on CIFAR-10; and 43% and 35% on CIFAR-100, respectively.
Adding noises to artificial neural network(ANN) has been shown to be able to improve robustness in previous work. In this work, we propose a new technique to compute the pathwise stochastic gradient estimate with respect to the standard deviation of
The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster $mathit{second}$-$mathit{order}$ optimization algorithms beyond SGD, with
At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. In many cases we only have access to input-output p
As one of the most important paradigms of recurrent neural networks, the echo state network (ESN) has been applied to a wide range of fields, from robotics to medicine, finance, and language processing. A key feature of the ESN paradigm is its reserv
What makes an artificial neural network easier to train and more likely to produce desirable solutions than other comparable networks? In this paper, we provide a new angle to study such issues under the setting of a fixed number of model parameters