Efficient Per-Example Gradient Computations in Convolutional Neural Networks

119 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Andre Manoel

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Gaspar Rochette - Andre Manoel - Eric W. Tramel

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Deep learning frameworks leverage GPUs to perform massively-parallel computations over batches of many training examples efficiently. However, for certain tasks, one may be interested in performing per-example computations, for instance using per-example gradients to evaluate a quantity of interest unique to each example. One notable application comes from the field of differential privacy, where per-example gradients must be norm-bounded in order to limit the impact of each example on the aggregated batch gradient. In this work, we discuss how per-example gradients can be efficiently computed in convolutional neural networks (CNNs). We compare existing strategies by performing a few steps of differentially-private training on CNNs of varying sizes. We also introduce a new strategy for per-example gradient calculation, which is shown to be advantageous depending on the model architecture and how the model is trained. This is a first step in making differentially-private training of CNNs practical.

قيم البحث

106 - Dushyant Mehta , Kwang In Kim , Christian Theobalt 2019

We show implicit filter level sparsity manifests in convolutional neural networks (CNNs) which employ Batch Normalization and ReLU activation, and are trained with adaptive gradient descent techniques and L2 regularization or weight decay. Through an extensive empirical study (Mehta et al., 2019) we hypothesize the mechanism behind the sparsification process, and find surprising links to certain filter sparsification heuristics proposed in literature. Emergence of, and the subsequent pruning of selective features is observed to be one of the contributing mechanisms, leading to feature sparsity at par or better than certain explicit sparsification / pruning approaches. In this workshop article we summarize our findings, and point out corollaries of selective-featurepenalization which could also be employed as heuristics for filter pruning

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Temporally Folded Convolutional Neural Networks for Sequence Forecasting

183 - Matthias Weissenbacher 2020

In this work we propose a novel approach to utilize convolutional neural networks for time series forecasting. The time direction of the sequential data with spatial dimensions $D=1,2$ is considered democratically as the input of a spatiotemporal $(D +1)$-dimensional convolutional neural network. Latter then reduces the data stream from $D +1 to D$ dimensions followed by an incriminator cell which uses this information to forecast the subsequent time step. We empirically compare this strategy to convolutional LSTMs and LSTMs on their performance on the sequential MNIST and the JSB chorals dataset, respectively. We conclude that temporally folded convolutional neural networks (TFCs) may outperform the conventional recurrent strategies.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Effective Version Space Reduction for Convolutional Neural Networks

123 - Jiayu Liu , Ioannis Chiotellis , Rudolph Triebel 2020

In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. However, many methods for neural networks are hypothesis space agnostic and do not address this problem. We exa mine active learning with convolutional neural networks through the principled lens of version space reduction. We identify the connection between two approaches---prior mass reduction and diameter reduction---and propose a new diameter-based querying method---the minimum Gibbs-vote disagreement. By estimating version space diameter and bias, we illustrate how version space of neural networks evolves and examine the realizability assumption. With experiments on MNIST, Fashion-MNIST, SVHN and STL-10 datasets, we demonstrate that diameter reduction methods reduce the version space more effectively and perform better than prior mass reduction and other baselines, and that the Gibbs vote disagreement is on par with the best query method.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Efficient forward propagation of time-sequences in convolutional neural networks using Deep Shifting

180 - Koen Groenland , Sander Bohte 2016

When a Convolutional Neural Network is used for on-the-fly evaluation of continuously updating time-sequences, many redundant convolution operations are performed. We propose the method of Deep Shifting, which remembers previously calculated results of convolution operations in order to minimize the number of calculations. The reduction in complexity is at least a constant and in the best case quadratic. We demonstrate that this method does indeed save significant computation time in a practical implementation, especially when the networks receives a large number of time-frames.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط الحوسبة العصبية والتطورية

Band-limited Training and Inference for Convolutional Neural Networks

139 - Adam Dziedzic , John Paparrizos , Sanjay Krishnan 2019

The convolutional layers are core building blocks of neural network architectures. In general, a convolutional filter applies to the entire frequency spectrum of the input data. We explore artificially constraining the frequency spectra of these filt ers and data, called band-limiting, during training. The frequency domain constraints apply to both the feed-forward and back-propagation steps. Experimentally, we observe that Convolutional Neural Networks (CNNs) are resilient to this compression scheme and results suggest that CNNs learn to leverage lower-frequency components. In particular, we found: (1) band-limited training can effectively control the resource usage (GPU and memory); (2) models trained with band-limited layers retain high prediction accuracy; and (3) requires no modification to existing training algorithms or neural network architectures to use unlike other compression schemes.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي