No Arabic abstract
A taxonomy is a standardized framework to classify and organize items into categories. Hierarchical taxonomies are ubiquitous, ranging from the classification of organisms to the file system on a computer. Characterizing the typical distribution of items within taxonomic categories is an important question with applications in many disciplines. Ecologists have long sought to account for the patterns observed in species-abundance distributions (the number of individuals per species found in some sample), and computer scientists study the distribution of files per directory. Is there a universal statistical distribution describing how many items are typically found in each category in large taxonomies? Here, we analyze a wide array of large, real-world datasets -- including items lost and found on the New York City transit system, library books, and a bacterial microbiome -- and discover such an underlying commonality. A simple, non-parametric branching model that randomly categorizes items and takes as input only the total number of items and the total number of categories successfully reproduces the abundance distributions in these datasets. This result may shed light on patterns in species-abundance distributions long observed in ecology. The model also predicts the number of taxonomic categories that remain unrepresented in a finite sample.
This paper analyses the impact of random failure or attack on the public transit networks of London and Paris in a comparative study. In particular we analyze how the dysfunction or removal of sets of stations or links (rails, roads, etc.) affects the connectivity properties within these networks. We show how accumulating dysfunction leads to emergent phenomena that cause the transportation system to break down as a whole. Simulating different directed attack strategies, we find minimal strategies with high impact and identify a-priory criteria that correlate with the resilience of these networks. To demonstrate our approach, we choose the London and Paris public transit networks. Our quantitative analysis is performed in the frames of the complex network theory - a methodological tool that has emerged recently as an interdisciplinary approach joining methods and concepts of the theory of random graphs, percolation, and statistical physics. In conclusion we demonstrate that taking into account cascading effects the network integrity is controlled for both networks by less than 0.5 % of the stations i.e. 19 for Paris and 34 for London.
Research into detection of dense communities has recently attracted increasing attention within network science, various metrics for detection of such communities have been proposed. The most popular metric -- Modularity -- is based on the so-called rule that the links within communities are denser than external links among communities, has become the default. However, this default metric suffers from ambiguity, and worse, all augmentations of modularity and based on a narrow intuition of what it means to form a community. We argue that in specific, but quite common systems, links within a community are not necessarily more common than links between communities. Instead we propose that the defining characteristic of a community is that links are more predictable within a community rather than between communities. In this paper, based on the effect of communities on link prediction, we propose a novel metric for the community detection based directly on this feature. We find that our metric is more robustness than traditional modularity. Consequently, we can achieve an evaluation of algorithm stability for the same detection algorithm in different networks. Our metric also can directly uncover the false community detection, and infer more statistical characteristics for detection algorithms.
In this work we consider the topological analysis of symbolic formal systems in the framework of network theory. In particular we analyse the network extracted by Principia Mathematica of B. Russell and A.N. Whitehead, where the vertices are the statements and two statements are connected with a directed link if one statement is used to demonstrate the other one. We compare the obtained network with other directed acyclic graphs, such as a scientific citation network and a stochastic model. We also introduce a novel topological ordering for directed acyclic graphs and we discuss its properties in respect to the classical one. The main result is the observation that formal systems of knowledge topologically behave similarly to self-organised systems.
We provide a general framework to model the growth of networks consisting of different coupled layers. Our aim is to estimate the impact of one such layer on the dynamics of the others. As an application, we study a scientometric network, where one layer consists of publications as nodes and citations as links, whereas the second layer represents the authors. This allows to address the question how characteristics of authors, such as their number of publications or number of previous co-authors, impacts the citation dynamics of a new publication. To test different hypotheses about this impact, our model combines citation constituents and social constituents in different ways. We then evaluate their performance in reproducing the citation dynamics in nine different physics journals. For this, we develop a general method for statistical parameter estimation and model selection that is applicable to growing multi-layer networks. It takes both the parameter errors and the model complexity into account and is computationally efficient and scalable to large networks.
With great theoretical and practical significance, identifying the node spreading influence of complex network is one of the most promising domains. So far, various topology-based centrality measures have been proposed to identify the node spreading influence in a network. However, the node spreading influence is a result of the interplay between the network topology structure and spreading dynamics. In this paper, we build up the systematic method by combining the network structure and spreading dynamics to identify the node spreading influence. By combining the adjacent matrix $A$ and spreading parameter $beta$, we theoretical give the node spreading influence with the eigenvector of the largest eigenvalue. Comparing with the Susceptible-Infected-Recovered (SIR) model epidemic results for four real networks, our method could identify the node spreading influence more accurately than the ones generated by the degree, K-shell and eigenvector centrality. This work may provide a systematic method for identifying node spreading influence.