Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

110 0 0.0 ( 0 )

Download Cite

Added by Behnam Neyshabur

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Behnam Neyshabur - Yuhuai Wu - Ruslan Salakhutdinov

Machine Learning Neural and Evolutionary Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.

rate research

Path-SGD: Path-Normalized Optimization in Deep Neural Networks

603 - Behnam Neyshabur , Ruslan Salakhutdinov , Nathan Srebro 2015

We revisit the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights. We argue for a geometry invariant to rescaling of weights that does not affect the output of the network, and suggest Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization. Path-SGD is easy and efficient to implement and leads to empirical gains over SGD and AdaGrad.

Machine Learning Computer Vision and Pattern Recognition Neural and Evolutionary Computing

Reachable Set Computation and Safety Verification for Neural Networks with ReLU Activations

91 - Weiming Xiang , Hoang-Dung Tran , Taylor T. Johnson 2017

Neural networks have been widely used to solve complex real-world problems. Due to the complicate, nonlinear, non-convex nature of neural networks, formal safety guarantees for the output behaviors of neural networks will be crucial for their applications in safety-critical systems.In this paper, the output reachable set computation and safety verification problems for a class of neural networks consisting of Rectified Linear Unit (ReLU) activation functions are addressed. A layer-by-layer approach is developed to compute output reachable set. The computation is formulated in the form of a set of manipulations for a union of polyhedra, which can be efficiently applied with the aid of polyhedron computation tools. Based on the output reachable set computation results, the safety verification for a ReLU neural network can be performed by checking the intersections of unsafe regions and output reachable set described by a union of polyhedra. A numerical example of a randomly generated ReLU neural network is provided to show the effectiveness of the approach developed in this paper.

Machine Learning Artificial Intelligence

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations

192 - Pranjal Awasthi , Alex Tang , Aravindan Vijayaraghavan 2021

We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{mathsf{T}}sigma({W}^mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias $b$ is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f(x)$. Using these ideas we also establish identifiability of the network parameters under minimal assumptions.

Machine Learning Data Structures and Algorithms Machine Learning

Unitary Evolution Recurrent Neural Networks

487 - Martin Arjovsky , Amar Shah , Yoshua Bengio 2015

Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

Machine Learning Neural and Evolutionary Computing Machine Learning

Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality on Holder Class

142 - Yuling Jiao , Yanming Lai , Xiliang Lu 2021

In this paper, we construct neural networks with ReLU, sine and $2^x$ as activation functions. For general continuous $f$ defined on $[0,1]^d$ with continuity modulus $omega_f(cdot)$, we construct ReLU-sine-$2^x$ networks that enjoy an approximation rate $mathcal{O}(omega_f(sqrt{d})cdot2^{-M}+omega_{f}left(frac{sqrt{d}}{N}right))$, where $M,Nin mathbb{N}^{+}$ denote the hyperparameters related to widths of the networks. As a consequence, we can construct ReLU-sine-$2^x$ network with the depth $5$ and width $maxleft{leftlceil2d^{3/2}left(frac{3mu}{epsilon}right)^{1/{alpha}}rightrceil,2leftlceillog_2frac{3mu d^{alpha/2}}{2epsilon}rightrceil+2right}$ that approximates $fin mathcal{H}_{mu}^{alpha}([0,1]^d)$ within a given tolerance $epsilon >0$ measured in $L^p$ norm $pin[1,infty)$, where $mathcal{H}_{mu}^{alpha}([0,1]^d)$ denotes the Holder continuous function class defined on $[0,1]^d$ with order $alpha in (0,1]$ and constant $mu > 0$. Therefore, the ReLU-sine-$2^x$ networks overcome the curse of dimensionality on $mathcal{H}_{mu}^{alpha}([0,1]^d)$. In addition to its supper expressive power, functions implemented by ReLU-sine-$2^x$ networks are (generalized) differentiable, enabling us to apply SGD to train.

Machine Learning Numerical Analysis Neural and Evolutionary Computing

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions