Learning Intrinsic Sparse Structures within Long Short-Term Memory

91 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Wei Wen

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Wei Wen - Yuxiong He - Samyam Rajbhandari

التعلم الآلي الذكاء الاصطناعي الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Model compression is significant for the wide adoption of Recurrent Neural Networks (RNNs) in both user devices possessing limited resources and business clusters requiring quick responses to large-scale service requests. This work aims to learn structurally-sparse Long Short-Term Memory (LSTM) by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs. Independently reducing the sizes of basic structures can result in inconsistent dimensions among them, and consequently, end up with invalid LSTM units. To overcome the problem, we propose Intrinsic Sparse Structures (ISS) in LSTMs. Removing a component of ISS will simultaneously decrease the sizes of all basic structures by one and thereby always maintain the dimension consistency. By learning ISS within LSTM units, the obtained LSTMs remain regular while having much smaller basic structures. Based on group Lasso regularization, our method achieves 10.59x speedup without losing any perplexity of a language modeling of Penn TreeBank dataset. It is also successfully evaluated through a compact model with only 2.69M weights for machine Question Answering of SQuAD dataset. Our approach is successfully extended to non- LSTM RNNs, like Recurrent Highway Networks (RHNs). Our source code is publicly available at https://github.com/wenwei202/iss-rnns

قيم البحث

82 - Yu Zhang , Guoguo Chen , Dong Yu 2015

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers. These direct links, called highway connections, enable unimpeded information f low across different layers and thus alleviate the gradient vanishing problem when building deeper LSTMs. We further introduce the latency-controlled bidirectional LSTMs (BLSTMs) which can exploit the whole history while keeping the latency under control. Efficient algorithms are proposed to train these novel networks using both frame and sequence discriminative criteria. Experiments on the AMI distant speech recognition (DSR) task indicate that we can train deeper LSTMs and achieve better improvement from sequence training with highway LSTMs (HLSTMs). Our novel model obtains $43.9/47.7%$ WER on AMI (SDM) dev and eval sets, outperforming all previous works. It beats the strong DNN and DLSTM baselines with $15.7%$ and $5.3%$ relative improvement respectively.

الحوسبة العصبية والتطورية الذكاء الاصطناعي الحساب واللغة

Tweets Can Tell: Activity Recognition using Hybrid Long Short-Term Memory Model

76 - Renhao Cui , Gagan Agrawal , Rajiv Ramnath 2019

This paper presents techniques to detect the offline activity a person is engaged in when she is tweeting (such as dining, shopping or entertainment), in order to create a dynamic profile of the user, for uses such as better targeting of advertisemen ts. To this end, we propose a hybrid LSTM model for rich contextual learning, along with studies on the effects of applying and combining multiple LSTM based methods with different contextual features. The hybrid model is shown to outperform a set of baselines and state-of-the-art methods. Finally, this paper presents an orthogonal validation with a real-case application. Our model generates an offline activity analysis for the followers of several well-known accounts, which is quite representative of the expected characteristics of these accounts.

الشبكات الاجتماعية والمعلومات الذكاء الاصطناعي الحساب واللغة

Intensity Prediction of Tropical Cyclones using Long Short-Term Memory Network

93 - Koushik Biswas , Sandeep Kumar , Ashish Kumar Pandey 2021

Tropical cyclones can be of varied intensity and cause a huge loss of lives and property if the intensity is high enough. Therefore, the prediction of the intensity of tropical cyclones advance in time is of utmost importance. We propose a novel stac ked bidirectional long short-term memory network (BiLSTM) based model architecture to predict the intensity of a tropical cyclone in terms of Maximum surface sustained wind speed (MSWS). The proposed model can predict MSWS well advance in time (up to 72 h) with very high accuracy. We have applied the model on tropical cyclones in the North Indian Ocean from 1982 to 2018 and checked its performance on two recent tropical cyclones, namely, Fani and Vayu. The model predicts MSWS (in knots) for the next 3, 12, 24, 36, 48, 60, and 72 hours with a mean absolute error of 1.52, 3.66, 5.88, 7.42, 8.96, 10.15, and 11.92, respectively.

التعلم الآلي الذكاء الاصطناعي

Associative Long Short-Term Memory

302 - Ivo Danihelka , Greg Wayne , Benigno Uria 2016

We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to Holographic Reduced Representations and Long Short-Term Memory networks. Holographic Reduced Representations have limited capacity: as they store more information, each retrieval becomes noisier due to interference. Our system in contrast creates redundant copies of stored information, which enables retrieval with reduced noise. Experiments demonstrate faster learning on multiple memorization tasks.

الحوسبة العصبية والتطورية

Quantum Optical Experiments Modeled by Long Short-Term Memory

95 - Thomas Adler , Manuel Erhard , Mario Krenn 2019

We demonstrate how machine learning is able to model experiments in quantum physics. Quantum entanglement is a cornerstone for upcoming quantum technologies such as quantum computation and quantum cryptography. Of particular interest are complex quan tum states with more than two particles and a large number of entangled quantum levels. Given such a multiparticle high-dimensional quantum state, it is usually impossible to reconstruct an experimental setup that produces it. To search for interesting experiments, one thus has to randomly create millions of setups on a computer and calculate the respective output states. In this work, we show that machine learning models can provide significant improvement over random search. We demonstrate that a long short-term memory (LSTM) neural network can successfully learn to model quantum experiments by correctly predicting output state characteristics for given setups without the necessity of computing the states themselves. This approach not only allows for faster search but is also an essential step towards automated design of multiparticle high-dimensional quantum experiments using generative machine learning models.

التعلم الآلي فيزياء الكم التعلم الالي