ﻻ يوجد ملخص باللغة العربية
Structural credit assignment for recurrent learning is challenging. An algorithm called RTRL can compute gradients for recurrent networks online but is computationally intractable for large networks. Alternatives, such as BPTT, are not online. In this work, we propose a credit-assignment algorithm -- algoname{} -- that approximates the gradients for recurrent learning in real-time using $O(n)$ operations and memory per-step. Our method builds on the idea that for modular recurrent networks, composed of columns with scalar states, it is sufficient for a parameter to only track its influence on the state of its column. We empirically show that as long as connections between columns are sparse, our method approximates the true gradient well. In the special case when there are no connections between columns, the $O(n)$ gradient estimate is exact. We demonstrate the utility of the approach for both recurrent state learning and meta-learning by comparing the estimated gradient to the true gradient on a synthetic test-bed.
Recurrent neural networks (RNNs), including long short-term memory (LSTM) RNNs, have produced state-of-the-art results on a variety of speech recognition tasks. However, these models are often too large in size for deployment on mobile devices with m
Deep neural networks are considered to be state of the art models in many offline machine learning tasks. However, their performance and generalization abilities in online learning tasks are much less understood. Therefore, we focus on online learnin
Process Mining consists of techniques where logs created by operative systems are transformed into process models. In process mining tools it is often desired to be able to classify ongoing process instances, e.g., to predict how long the process wil
The number of published PDF documents has increased exponentially in recent decades. There is a growing need to make their rich content discoverable to information retrieval tools. In this paper, we present a novel approach to document structure reco
Kernels derived from deep neural networks (DNNs) in the infinite-width provide not only high performance in a range of machine learning tasks but also new theoretical insights into DNN training dynamics and generalization. In this paper, we extend th