No Arabic abstract
Document networks are characteristic in that a document node, e.g. a webpage or an article, carries meaningful content. Properties of document networks are not only affected by topological connectivity between nodes, but also strongly influenced by the semantic relation between content of the nodes. We observe that document networks have a large number of triangles and a high value of clustering coefficient. And there is a strong correlation between the probability of formation of a triangle and the content similarity among the three nodes involved. We propose the degree-similarity product (DSP) model which well reproduces these properties. The model achieves this by using a preferential attachment mechanism which favours the linkage between nodes that are both popular and similar. This work is a step forward towards a better understanding of the structure and evolution of document networks.
Algorithms for community detection are usually stochastic, leading to different partitions for different choices of random seeds. Consensus clustering has proven to be an effective technique to derive more stable and accurate partitions than the ones obtained by the direct application of the algorithm. However, the procedure requires the calculation of the consensus matrix, which can be quite dense if (some of) the clusters of the input partitions are large. Consequently, the complexity can get dangerously close to quadratic, which makes the technique inapplicable on large graphs. Here we present a fast variant of consensus clustering, which calculates the consensus matrix only on the links of the original graph and on a comparable number of additional node pairs, suitably chosen. This brings the complexity down to linear, while the performance remains comparable as the full technique. Therefore, our fast consensus clustering procedure can be applied on networks with millions of nodes and links.
Networks with underlying metric spaces attract increasing research attention in network science, statistical physics, applied mathematics, computer science, sociology, and other fields. This attention is further amplified by the current surge of activity in graph embedding. In the vast realm of spatial network models, only a few reproduce even the most basic properties of real-world networks. Here, we focus on three such properties---sparsity, small worldness, and clustering---and identify the general subclass of spatial homogeneous and heterogeneous network models that are sparse small worlds and that have nonzero clustering in the thermodynamic limit. We rely on the maximum entropy approach where network links correspond to noninteracting fermions whose energy dependence on spatial distances determines network small worldness and clustering.
Many real-world networks display a natural bipartite structure. It is necessary and important to study the bipartite networks by using the bipartite structure of the data. Here we propose a modification of the clustering coefficient given by the fraction of cycles with size four in bipartite networks. Then we compare the two definitions in a special graph, and the results show that the modification one is better to character the network. Next we define a edge-clustering coefficient of bipartite networks to detect the community structure in original bipartite networks.
Network models with latent geometry have been used successfully in many applications in network science and other disciplines, yet it is usually impossible to tell if a given real network is geometric, meaning if it is a typical element in an ensemble of random geometric graphs. Here we identify structural properties of networks that guarantee that random graphs having these properties are geometric. Specifically we show that random graphs in which expected degree and clustering of every node are fixed to some constants are equivalent to random geometric graphs on the real line, if clustering is sufficiently strong. Large numbers of triangles, homogeneously distributed across all nodes as in real networks, are thus a consequence of network geometricity. The methods we use to prove this are quite general and applicable to other network ensembles, geometric or not, and to certain problems in quantum gravity.
A bridge in a graph is an edge whose removal disconnects the graph and increases the number of connected components. We calculate the fraction of bridges in a wide range of real-world networks and their randomized counterparts. We find that real networks typically have more bridges than their completely randomized counterparts, but very similar fraction of bridges as their degree-preserving randomizations. We define a new edge centrality measure, called bridgeness, to quantify the importance of a bridge in damaging a network. We find that certain real networks have very large average and variance of bridgeness compared to their degree-preserving randomizations and other real networks. Finally, we offer an analytical framework to calculate the bridge fraction , the average and variance of bridgeness for uncorrelated random networks with arbitrary degree distributions.