أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Rudolf Hanel

Understanding Zipfs law of word frequencies through sample-space collapse in sentence formation

145 - Stefan Thurner , Rudolf Hanel , Bo Liu 2014

The formation of sentences is a highly structured and history-dependent process. The probability of using a specific word in a sentence strongly depends on the history of word-usage earlier in that sentence. We study a simple history-dependent model of text generation assuming that the sample-space of word usage reduces along sentence formation, on average. We first show that the model explains the approximate Zipf law found in word frequencies as a direct consequence of sample-space reduction. We then empirically quantify the amount of sample-space reduction in the sentences of ten famous English books, by analysis of corresponding word-transition tables that capture which words can follow any given word in a text. We find a highly nested structure in these transition tables and show that this `nestedness is tightly related to the power law exponents of the observed word frequency distributions. With the proposed model it is possible to understand that the nestedness of a text can be the origin of the actual scaling exponent, and that deviations from the exact Zipf law can be understood by variations of the degree of nestedness on a book-by-book basis. On a theoretical level we are able to show that in case of weak nesting, Zipfs law breaks down in a fast transition. Unlike previous attempts to understand Zipfs law in language the sample-space reducing model is not based on assumptions of multiplicative, preferential, or self-organised critical mechanisms behind language formation, but simply used the empirically quantifiable parameter nestedness to understand the statistics of word frequencies.

الفيزياء والمجتمع الحساب واللغة

Understanding scaling through history-dependent processes with collapsing sample space

72 - Bernat Corominas-Murtra , Rudolf Hanel , Stefan Thurner 2014

History-dependent processes are ubiquitous in natural and social systems. Many such stochastic processes, especially those that are associated with complex systems, become more constrained as they unfold, meaning that their sample-space, or their set of possible outcomes, reduces as they age. We demonstrate that these sample-space reducing (SSR) processes necessarily lead to Zipfs law in the rank distributions of their outcomes. We show that by adding noise to SSR processes the corresponding rank distributions remain exact power-laws, $p(x)sim x^{-lambda}$, where the exponent directly corresponds to the mixing ratio of the SSR process and noise. This allows us to give a precise meaning to the scaling exponent in terms of the degree to how much a given process reduces its sample-space as it unfolds. Noisy SSR processes further allow us to explain a wide range of scaling exponents in frequency distributions ranging from $alpha = 2$ to $infty$. We discuss several applications showing how SSR processes can be used to understand Zipfs law in word frequencies, and how they are related to diffusion processes in directed networks, or ageing processes such as in fragmentation processes. SSR processes provide a new alternative to understand the origin of scaling in complex systems without the recourse to multiplicative, preferential, or self-organised critical processes.

الفيزياء والمجتمع الميكانيكا الإحصائية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد