ﻻ يوجد ملخص باللغة العربية
Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful fANOVA framework. In total, we summarize the results of 5400 experimental runs ($approx 15$ years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.
Much sequential data exhibits highly non-uniform information distribution. This cannot be correctly modeled by traditional Long Short-Term Memory (LSTM). To address that, recent works have extended LSTM by adding more activations between adjacent inp
In this work, we present a simple and general search space shrinking method, called Angle-Based search space Shrinking (ABS), for Neural Architecture Search (NAS). Our approach progressively simplifies the original search space by dropping unpromisin
We present our new advanced model for population synthesis of close-by cooling NSs. Detailed treatment of the initial spatial distribution of NS progenitors and a detailed ISM structure up to 3 kpc give us an opportunity to discuss the strategy to lo
Performing analytical tasks over graph data has become increasingly interesting due to the ubiquity and large availability of relational information. However, unlike images or sentences, there is no notion of sequence in networks. Nodes (and edges) f
Many applications in machine learning require optimizing a function whose true gradient is unknown, but where surrogate gradient information (directions that may be correlated with, but not necessarily identical to, the true gradient) is available in