Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Complex Evolution Recurrent Neural Networks (ceRNNs)

64 0 0.0 ( 0 )

Download Cite

Added by Izhak Shafran

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Izhak Shafran - Tom Bagby -

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Unitary Evolution Recurrent Neural Networks (uRNNs) have three attractive properties: (a) the unitary property, (b) the complex-valued nature, and (c) their efficient linear operators. The literature so far does not address -- how critical is the unitary property of the model? Furthermore, uRNNs have not been evaluated on large tasks. To study these shortcomings, we propose the complex evolution Recurrent Neural Networks (ceRNNs), which is similar to uRNNs but drops the unitary property selectively. On a simple multivariate linear regression task, we illustrate that dropping the constraints improves the learning trajectory. In copy memory task, ceRNNs and uRNNs perform identically, demonstrating that their superior performance over LSTMs is due to complex-valued nature and their linear operators. In a large scale real-world speech recognition, we find that pre-pending a uRNN degrades the performance of our baseline LSTM acoustic models, while pre-pending a ceRNN improves the performance over the baseline by 0.8% absolute WER.

rate research

Learning Compact Recurrent Neural Networks

118 - Zhiyun Lu , Vikas Sindhwani , Tara N. Sainath 2016

Recurrent neural networks (RNNs), including long short-term memory (LSTM) RNNs, have produced state-of-the-art results on a variety of speech recognition tasks. However, these models are often too large in size for deployment on mobile devices with memory and latency constraints. In this work, we study mechanisms for learning compact RNNs and LSTMs via low-rank factorizations and parameter sharing schemes. Our goal is to investigate redundancies in recurrent architectures where compression can be admitted without losing performance. A hybrid strategy of using structured matrices in the bottom layers and shared low-rank factors on the top layers is found to be particularly effective, reducing the parameters of a standard LSTM by 75%, at a small cost of 0.3% increase in WER, on a 2,000-hr English Voice Search task.

Machine Learning Computation and Language Neural and Evolutionary Computing

Unitary Evolution Recurrent Neural Networks

487 - Martin Arjovsky , Amar Shah , Yoshua Bengio 2015

Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

Machine Learning Neural and Evolutionary Computing Machine Learning

Exploring Sparsity in Recurrent Neural Networks

143 - Sharan Narang , Erich Elsen , Gregory Diamos 2017

Recurrent Neural Networks (RNN) are widely used to solve a variety of problems and as the quantity of data and the amount of available compute have increased, so have model sizes. The number of parameters in recent state-of-the-art networks makes them hard to deploy, especially on mobile phones and embedded devices. The challenge is due to both the size of the model and the time it takes to evaluate it. In order to deploy these RNNs efficiently, we propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network. At the end of training, the parameters of the network are sparse while accuracy is still close to the original dense neural network. The network size is reduced by 8x and the time required to train the model remains constant. Additionally, we can prune a larger dense network to achieve better than baseline performance while still reducing the total number of parameters significantly. Pruning RNNs reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x.

Machine Learning Computation and Language

Syntactically Informed Text Compression with Recurrent Neural Networks

133 - David Cox 2016

We present a self-contained system for constructing natural language models for use in text compression. Our system improves upon previous neural network based models by utilizing recent advances in syntactic parsing -- Googles SyntaxNet -- to augment character-level recurrent neural networks. RNNs have proven exceptional in modeling sequence data such as text, as their architecture allows for modeling of long-term contextual information.

Machine Learning Computation and Language Information Theory

Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

131 - Zhiyuan Tang , Ying Shi , Dong Wang 2016

Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU). However, the dynamic properties behind the remarkable performance remain unclear in many applications, e.g., automatic speech recognition (ASR). This paper employs visualization techniques to study the behavior of LSTM and GRU when performing speech recognition tasks. Our experiments show some interesting patterns in the gated memory, and some of them have inspired simple yet effective modifications on the network structure. We report two of such modifications: (1) lazy cell update in LSTM, and (2) shortcut connections for residual learning. Both modifications lead to more comprehensible and powerful networks.

Machine Learning Computation and Language Neural and Evolutionary Computing

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Complex Evolution Recurrent Neural Networks (ceRNNs)

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions