ترغب بنشر مسار تعليمي؟ اضغط هنا

From Boltzmann to Zipf through Shannon and Jaynes

54   0   0.0 ( 0 )
 نشر من قبل Alvaro Corral
 تاريخ النشر 2019
  مجال البحث فيزياء
والبحث باللغة English




اسأل ChatGPT حول البحث

The word-frequency distribution provides the fundamental building blocks that generate discourse in language. It is well known, from empirical evidence, that the word-frequency distribution of almost any text is described by Zipfs law, at least approximately. Following Stephens and Bialek [Phys. Rev. E 81, 066119, 2010], we interpret the frequency of any word as arising from the interaction potential between its constituent letters. Indeed, Jaynes maximum-entropy principle, with the constrains given by every empirical two-letter marginal distribution, leads to a Boltzmann distribution for word probabilities, with an energy-like function given by the sum of all pairwise (two-letter) potentials. The improved iterative-scaling algorithm allows us finding the potentials from the empirical two-letter marginals. Appling this formalism to words with up to six letters from the English subset of the recently created Standardized Project Gutenberg Corpus, we find that the model is able to reproduce Zipfs law, but with some limitations: the general Zipfs power-law regime is obtained, but the probability of individual words shows considerable scattering. In this way, a pure statistical-physics framework is used to describe the probabilities of words. As a by-product, we find that both the empirical two-letter marginal distributions and the interaction-potential distributions follow well-defined statistical laws.


قيم البحث

اقرأ أيضاً

This work studies the Zipf Law for cities in Brazil. Data from censuses of 1970, 1980, 1991 and 2000 were used to select a sample containing only cities with 30,000 inhabitants or more. The results show that the population distribution in Brazilian c ities does follow a power law similar to the ones found in other countries. Estimates of the power law exponent were found to be 2.22 +/- 0.34 for the 1970 and 1980 censuses, and 2.26 +/- 0.11 for censuses of 1991 and 2000. More accurate results were obtained with the maximum likelihood estimator, showing an exponent equal to 2.41 for 1970 and 2.36 for the other three years.
Inspired by the analysis of several empirical online social networks, we propose a simple reaction-diffusion-like coevolving model, in which individuals are activated to create links based on their states, influenced by local dynamics and their own i ntention. It is shown that the model can reproduce the remarkable properties observed in empirical online social networks; in particular, the assortative coefficients are neutral or negative, and the power law exponents are smaller than 2. Moreover, we demonstrate that, under appropriate conditions, the model network naturally makes transition(s) from assortative to disassortative, and from sparse to dense in their characteristics. The model is useful in understanding the formation and evolution of online social networks.
175 - Frank Schweitzer 2020
The social percolation model citep{solomon-et-00} considers a 2-dimensional regular lattice. Each site is occupied by an agent with a preference $x_{i}$ sampled from a uniform distribution $U[0,1]$. Agents transfer the information about the quality $ q$ of a movie to their neighbors only if $x_{i}leq q$. Information percolates through the lattice if $q=q_{c}=0.593$. -- From a network perspective the percolating cluster can be seen as a random-regular network with $n_{c}$ nodes and a mean degree that depends on $q_{c}$. Preserving these quantities of the random-regular network, a true random network can be generated from the $G(n,p)$ model after determining the link probability $p$. I then demonstrate how this random network can be transformed into a threshold network, where agents create links dependent on their $x_{i}$ values. Assuming a dynamics of the $x_{i}$ and a mechanism of group formation, I further extend the model toward an adaptive social network model.
A detailed empirical analysis of the productivity of non financial firms across several countries and years shows that productivity follows a non-Gaussian distribution with power law tails. We demonstrate that these empirical findings can be interpre ted as consequence of a mechanism of exchanges in a social network where firms improve their productivity by direct innovation or/and by imitation of other firms technological and organizational solutions. The type of network-connectivity determines how fast and how efficiently information can diffuse and how quickly innovation will permeate or behaviors will be imitated. From a model for innovation flow through a complex network we obtain that the expectation values of the productivity level are proportional to the connectivity of the network of links between firms. The comparison with the empirical distributions reveals that such a network must be of a scale-free type with a power-law degree distribution in the large connectivity range.
We show how the prevailing majority opinion in a population can be rapidly reversed by a small fraction p of randomly distributed committed agents who consistently proselytize the opposing opinion and are immune to influence. Specifically, we show th at when the committed fraction grows beyond a critical value p_c approx 10%, there is a dramatic decrease in the time, T_c, taken for the entire population to adopt the committed opinion. In particular, for complete graphs we show that when p < p_c, T_c sim exp(alpha(p)N), while for p > p_c, T_c sim ln N. We conclude with simulation results for ErdH{o}s-Renyi random graphs and scale-free networks which show qualitatively similar behavior.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا