New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Convolutional neural networks with fractional order gradient method

175 0 0.0 ( 0 )

Download Cite

Added by Dian Sheng

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Dian Sheng - Yiheng Wei - Yuquan Chen

Optimization and Control Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper proposes a fractional order gradient method for the backward propagation of convolutional neural networks. To overcome the problem that fractional order gradient method cannot converge to real extreme point, a simplified fractional order gradient method is designed based on Caputos definition. The parameters within layers are updated by the designed gradient method, but the propagations between layers still use integer order gradients, and thus the complicated derivatives of composite functions are avoided and the chain rule will be kept. By connecting every layers in series and adding loss functions, the proposed convolutional neural networks can be trained smoothly according to various tasks. Some practical experiments are carried out in order to demonstrate fast convergence, high accuracy and ability to escape local optimal point at last.

rate research

A novel perspective to gradient method: the fractional order approach

141 - Yuquan Chen , Yiheng Wei , Yong Wang 2019

In this paper, we give some new thoughts about the classical gradient method (GM) and recall the proposed fractional order gradient method (FOGM). It is proven that the proposed FOGM holds a super convergence capacity and a faster convergence rate around the extreme point than the conventional GM. The property of asymptotic convergence of conventional GM and FOGM is also discussed. To achieve both a super convergence capability and an even faster convergence rate, a novel switching FOGM is proposed. Moreover, we extend the obtained conclusion to a more general case by introducing the concept of p-order Lipschitz continuous gradient and p-order strong convex. Numerous simulation examples are provided to validate the effectiveness of proposed methods.

Optimization and Control Signal Processing

Efficient Per-Example Gradient Computations in Convolutional Neural Networks

118 - Gaspar Rochette , Andre Manoel , Eric W. Tramel 2019

Deep learning frameworks leverage GPUs to perform massively-parallel computations over batches of many training examples efficiently. However, for certain tasks, one may be interested in performing per-example computations, for instance using per-example gradients to evaluate a quantity of interest unique to each example. One notable application comes from the field of differential privacy, where per-example gradients must be norm-bounded in order to limit the impact of each example on the aggregated batch gradient. In this work, we discuss how per-example gradients can be efficiently computed in convolutional neural networks (CNNs). We compare existing strategies by performing a few steps of differentially-private training on CNNs of varying sizes. We also introduce a new strategy for per-example gradient calculation, which is shown to be advantageous depending on the model architecture and how the model is trained. This is a first step in making differentially-private training of CNNs practical.

Machine Learning Computer Vision and Pattern Recognition Machine Learning

DropFilter: A Novel Regularization Method for Learning Convolutional Neural Networks

86 - Hengyue Pan , Hui Jiang , Xin Niu 2018

The past few years have witnessed the fast development of different regularization methods for deep learning models such as fully-connected deep neural networks (DNNs) and Convolutional Neural Networks (CNNs). Most of previous methods mainly consider to drop features from input data and hidden layers, such as Dropout, Cutout and DropBlocks. DropConnect select to drop connections between fully-connected layers. By randomly discard some features or connections, the above mentioned methods control the overfitting problem and improve the performance of neural networks. In this paper, we proposed two novel regularization methods, namely DropFilter and DropFilter-PLUS, for the learning of CNNs. Different from the previous methods, DropFilter and DropFilter-PLUS selects to modify the convolution filters. For DropFilter-PLUS, we find a suitable way to accelerate the learning process based on theoretical analysis. Experimental results on MNIST show that using DropFilter and DropFilter-PLUS may improve performance on image classification tasks.

Machine Learning Computer Vision and Pattern Recognition Machine Learning

Design of generalized fractional order gradient descent method

144 - Yiheng Wei , Yu Kang , Weidi Yin 2018

This paper focuses on the convergence problem of the emerging fractional order gradient descent method, and proposes three solutions to overcome the problem. In fact, the general fractional gradient method cannot converge to the real extreme point of the target function, which critically hampers the application of this method. Because of the long memory characteristics of fractional derivative, fixed memory principle is a prior choice. Apart from the truncation of memory length, two new methods are developed to reach the convergence. The one is the truncation of the infinite series, and the other is the modification of the constant fractional order. Finally, six illustrative examples are performed to illustrate the effectiveness and practicability of proposed methods.

Signal Processing Optimization and Control

Fast convolutional neural networks on FPGAs with hls4ml

163 - Thea Aarrestad , Vladimir Loncar , Nicol`o Ghielmetti 2021

We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5,mu$s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.

Machine Learning Computer Vision and Pattern Recognition High Energy Physics - Experiment

comments

Fetching comments

Tishreen University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Convolutional neural networks with fractional order gradient method

Ask ChatGPT about the research

No Arabic abstract

Read More