ﻻ يوجد ملخص باللغة العربية
Recurrent neural networks empirically generate natural language with high syntactic fidelity. However, their success is not well-understood theoretically. We provide theoretical insight into this success, proving in a finite-precision setting that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax. We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth, reflecting the bounded memory needs and long-distance dependencies of natural language syntax. The best known results use $O(k^{frac{m}{2}})$ memory (hidden units) to generate these languages. We prove that an RNN with $O(m log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction. Finally, we show that no algorithm, even with unbounded computation, can suffice with $o(m log k)$ hidden units.
Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as $mathsf{Dyck}_k$, the language consisting of well-nested parentheses of $k$ t
Interleaved texts, where posts belonging to different threads occur in one sequence, are a common occurrence, e.g., online chat conversations. To quickly obtain an overview of such texts, existing systems first disentangle the posts by threads and th
The dialects of Madagascar belong to the Greater Barito East group of the Austronesian family and it is widely accepted that the Island was colonized by Indonesian sailors after a maritime trek which probably took place around 650 CE. The language mo
While recurrent models have been effective in NLP tasks, their performance on context-free languages (CFLs) has been found to be quite weak. Given that CFLs are believed to capture important phenomena such as hierarchical structure in natural languag
An implicit goal in works on deep generative models is that such models should be able to generate novel examples that were not previously seen in the training data. In this paper, we investigate to what extent this property holds for widely employed