No Arabic abstract
Complex systems often comprise many kinds of components which vary over many orders of magnitude in size: Populations of cities in countries, individual and corporate wealth in economies, species abundance in ecologies, word frequency in natural language, and node degree in complex networks. Comparisons of component size distributions for two complex systems---or a system with itself at two different time points---generally employ information-theoretic instruments, such as Jensen-Shannon divergence. We argue that these methods lack transparency and adjustability, and should not be applied when component probabilities are non-sensible or are problematic to estimate. Here, we introduce `allotaxonometry along with `rank-turbulence divergence, a tunable instrument for comparing any two (Zipfian) ranked lists of components. We analytically develop our rank-based divergence in a series of steps, and then establish a rank-based allotaxonograph which pairs a map-like histogram for rank-rank pairs with an ordered list of components according to divergence contribution. We explore the performance of rank-turbulence divergence for a series of distinct settings including: Language use on Twitter and in books, species abundance, baby name popularity, market capitalization, performance in sports, mortality causes, and job titles. We provide a series of supplementary flipbooks which demonstrate the tunability and storytelling power of rank-based allotaxonometry.
Real-world complex systems often comprise many distinct types of elements as well as many more types of networked interactions between elements. When the relative abundances of types can be measured well, we further observe heavy-tailed categorical distributions for type frequencies. For the comparison of type frequency distributions of two systems or a system with itself at different time points in time -- a facet of allotaxonometry -- a great range of probability divergences are available. Here, we introduce and explore `probability-turbulence divergence, a tunable, straightforward, and interpretable instrument for comparing normalizable categorical frequency distributions. We model probability-turbulence divergence (PTD) after rank-turbulence divergence (RTD). While probability-turbulence divergence is more limited in application than rank-turbulence divergence, it is more sensitive to changes in type frequency. We build allotaxonographs to display probability turbulence, incorporating a way to visually accommodate zero probabilities for `exclusive types which are types that appear in only one system. We explore comparisons of example distributions taken from literature, social media, and ecology. We show how probability-turbulence divergence either explicitly or functionally generalizes many existing kinds of distances and measures, including, as special cases, $L^{(p)}$ norms, the S{o}rensen-Dice coefficient (the $F_1$ statistic), and the Hellinger distance. We discuss similarities with the generalized entropies of R{e}nyi and Tsallis, and the diversity indices (or Hill numbers) from ecology. We close with thoughts on open problems concerning the optimization of the tuning of rank- and probability-turbulence divergence.
A key measure that has been used extensively in analyzing complex networks is the degree of a node (the number of the nodes neighbors). Because of its discrete nature, when the degree measure was used in analyzing weighted networks, weights were either ignored or thresholded in order to retain or disregard an edge. Therefore, despite its popularity, the degree measure fails to capture the disparity of interaction between a node and its neighbors. We introduce in this paper a generalization of the degree measure that addresses this limitation: the continuous node degree (C-degree). The C-degree of a node reflects how many neighbors are effectively being used, taking interaction disparity into account. More importantly, if a node interacts uniformly with its neighbors (no interaction disparity), we prove that the C-degree of the node becomes identical to the nodes (discrete) degree. We analyze four real-world weighted networks using the new measure and show that the C-degree distribution follows the power-law, similar to the traditional degree distribution, but with steeper decline. We also show that the ratio between the C-degree and the (discrete) degree follows a pattern that is common in the four studied networks.
For generic systems exhibiting power law behaviors, and hence multiscale dependencies, we propose a new, and yet simple, tool to analyze multifractality and intermittency, after noticing that these concepts are directly related to the deformation of a probability density function from Gaussian at large scales to non-Gaussian at smaller scales. Our framework is based on information theory, and uses Shannon entropy and Kullback-Leibler divergence. We propose an extensive application to three-dimensional fully developed turbulence, seen here as a paradigmatic complex system where intermittency was historically defined. Moreover, the concepts of scale invariance and multifractality were extensively studied in this field and, most importantly, benchmarked. We compute our measure on experimental Eulerian velocity measurements, as well as on synthetic processes and a phenomenological model of fluid turbulence.Our approach is very general and does not require any underlying model of the system, although it can probe the relevance of such a model.
We study the self-organization of the consonant inventories through a complex network approach. We observe that the distribution of occurrence as well as cooccurrence of the consonants across languages follow a power-law behavior. The co-occurrence network of consonants exhibits a high clustering coefficient. We propose four novel synthesis models for these networks (each of which is a refinement of the earlier) so as to successively match with higher accuracy (a) the above mentioned topological properties as well as (b) the linguistic property of feature economy exhibited by the consonant inventories. We conclude by arguing that a possible interpretation of this mechanism of network growth is the process of child language acquisition. Such models essentially increase our understanding of the structure of languages that is influenced by their evolutionary dynamics and this, in turn, can be extremely useful for building future NLP applications.
Stars and cycles are basic structures in network construction. The former has been well studied in network analysis, while the latter attracted rare attention. A node together with its neighbors constitute a neighborhood star-structure where the basic assumption is two nodes interact through their direct connection. A cycle is a closed loop with many nodes who can influence each other even without direct connection. Here we show their difference and relationship in understanding network structure and function. We define two cycle-based node characteristics, namely cycle number and cycle ratio, which can be used to measure a nodes importance. Numerical analyses on six disparate real networks suggest that the nodes with higher cycle ratio are more important to network connectivity, while cycle number can better quantify a node influence of cycle-based spreading than the common star-based node centralities. We also find that an ordinary network can be converted into a hypernetwork by considering its basic cycles as hyperedges, meanwhile, a new matrix called the cycle number matrix is captured. We hope that this paper can open a new direction of understanding both local and global structures of network and its function.