Optimizing Performance of Recurrent Neural Networks on GPUs

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Phil Blunsom

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jeremy Appleyard - Tomas Kocisky - Phil Blunsom

التعلم الآلي الحوسبة العصبية والتطورية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

As recurrent neural networks become larger and deeper, training times for single networks are rising into weeks or even months. As such there is a significant incentive to improve the performance and scalability of these networks. While GPUs have become the hardware of choice for training and deploying recurrent models, the implementations employed often make use of only basic optimizations for these architectures. In this article we demonstrate that by exposing parallelism between operations within the network, an order of magnitude speedup across a range of network sizes can be achieved over a naive implementation. We describe three stages of optimization that have been incorporated into the fifth release of NVIDIAs cuDNN: firstly optimizing a single cell, secondly a single layer, and thirdly the entire network.

قيم البحث

487 - Martin Arjovsky , Amar Shah , Yoshua Bengio 2015

Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gra dients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Coverage Guided Testing for Recurrent Neural Networks

128 - Wei Huang , Youcheng Sun , Xingyu Zhao 2019

Recurrent neural networks (RNNs) have been applied to a broad range of applications, including natural language processing, drug discovery, and video recognition. Their vulnerability to input perturbation is also known. Aligning with a view from soft ware defect detection, this paper aims to develop a coverage guided testing approach to systematically exploit the internal behaviour of RNNs, with the expectation that such testing can detect defects with high possibility. Technically, the long short term memory network (LSTM), a major class of RNNs, is thoroughly studied. A family of three test metrics are designed to quantify not only the values but also the temporal relations (including both step-wise and bounded-length) exhibited when LSTM processing inputs. A genetic algorithm is applied to efficiently generate test cases. The test metrics and test case generation algorithm are implemented into a tool TestRNN, which is then evaluated on a set of LSTM benchmarks. Experiments confirm that TestRNN has advantages over the state-of-art tool DeepStellar and attack-based defect detection methods, owing to its working with finer temporal semantics and the consideration of the naturalness of input perturbation. Furthermore, TestRNN enables meaningful information to be collected and exhibited for users to understand the testing results, which is an important step towards interpretable neural network testing.

التعلم الآلي الحوسبة العصبية والتطورية

Structured Pruning of Recurrent Neural Networks through Neuron Selection

155 - Liangjian Wen , Xuanyang Zhang , Haoli Bai 2019

Recurrent neural networks (RNNs) have recently achieved remarkable successes in a number of applications. However, the huge sizes and computational burden of these models make it difficult for their deployment on edge devices. A practically effective approach is to reduce the overall storage and computation costs of RNNs by network pruning techniques. Despite their successful applications, those pruning methods based on Lasso either produce irregular sparse patterns in weight matrices, which is not helpful in practical speedup. To address these issues, we propose structured pruning method through neuron selection which can reduce the sizes of basic structures of RNNs. More specifically, we introduce two sets of binary random variables, which can be interpreted as gates or switches to the input neurons and the hidden neurons, respectively. We demonstrate that the corresponding optimization problem can be addressed by minimizing the L0 norm of the weight matrix. Finally, experimental results on language modeling and machine reading comprehension tasks have indicated the advantages of the proposed method in comparison with state-of-the-art pruning competitors. In particular, nearly 20 x practical speedup during inference was achieved without losing performance for language model on the Penn TreeBank dataset, indicating the promising performance of the proposed method

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Dynamic Cell Structure via Recursive-Recurrent Neural Networks

203 - Xin Qian , Matthew Kennedy , 2019

In a recurrent setting, conventional approaches to neural architecture search find and fix a general model for all data samples and time steps. We propose a novel algorithm that can dynamically search for the structure of cells in a recurrent neural network model. Based on a combination of recurrent and recursive neural networks, our algorithm is able to construct customized cell structures for each data sample and time step, allowing for a more efficient architecture search than existing models. Experiments on three common datasets show that the algorithm discovers high-performance cell architectures and achieves better prediction accuracy compared to the GRU structure for language modelling and sentiment analysis.

التعلم الآلي الحوسبة العصبية والتطورية التعلم الالي

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

109 - Behnam Neyshabur , Yuhuai Wu , Ruslan Salakhutdinov 2016

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require ca pturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.

التعلم الآلي الحوسبة العصبية والتطورية