No Arabic abstract
In transportation, communication, social and other real complex networks, some critical edges act a pivotal part in controlling the flow of information and maintaining the integrity of the structure. Due to the importance of critical edges in theoretical studies and practical applications, the identification of critical edges gradually become a hot topic in current researches. Considering the overlap of communities in the neighborhood of edges, a novel and effective metric named subgraph overlap (SO) is proposed to quantifying the significance of edges. The experimental results show that SO outperforms all benchmarks in identifying critical edges which are crucial in maintaining the integrity of the structure and functions of networks.
Community structure is a typical property of many real-world networks, and has become a key to understand the dynamics of the networked systems. In these networks most nodes apparently lie in a community while there often exists a few nodes straddling several communities. An ideal algorithm for community detection is preferable which can identify the overlapping communities in such networks. To represent an overlapping division we develop a encoding schema composed of two segments, the first one represents a disjoint partition and the second one represents a extension of the partition that allows of multiple memberships. We give a measure for the informativeness of a node, and present an evolutionary method for detecting the overlapping communities in a network.
Social networks play a fundamental role in the diffusion of information. However, there are two different ways of how information reaches a person in a network. Information reaches us through connections in our social networks, as well as through the influence of external out-of-network sources, like the mainstream media. While most present models of information adoption in networks assume information only passes from a node to node via the edges of the underlying network, the recent availability of massive online social media data allows us to study this process in more detail. We present a model in which information can reach a node via the links of the social network or through the influence of external sources. We then develop an efficient model parameter fitting technique and apply the model to the emergence of URL mentions in the Twitter network. Using a complete one month trace of Twitter we study how information reaches the nodes of the network. We quantify the external influences over time and describe how these influences affect the information adoption. We discover that the information tends to jump across the network, which can only be explained as an effect of an unobservable external influence on the network. We find that only about 71% of the information volume in Twitter can be attributed to network diffusion, and the remaining 29% is due to external events and factors outside the network.
The full range of activity in a temporal network is captured in its edge activity data -- time series encoding the tie strengths or on-off dynamics of each edge in the network. However, in many practical applications, edge-level data are unavailable, and the network analyses must rely instead on node activity data which aggregates the edge-activity data and thus is less informative. This raises the question: Is it possible to use the static network to recover the richer edge activities from the node activities? Here we show that recovery is possible, often with a surprising degree of accuracy given how much information is lost, and that the recovered data are useful for subsequent network analysis tasks. Recovery is more difficult when network density increases, either topologically or dynamically, but exploiting dynamical and topological sparsity enables effective solutions to the recovery problem. We formally characterize the difficulty of the recovery problem both theoretically and empirically, proving the conditions under which recovery errors can be bounded and showing that, even when these conditions are not met, good quality solutions can still be derived. Effective recovery carries both promise and peril, as it enables deeper scientific study of complex systems but in the context of social systems also raises privacy concerns when social information can be aggregated across multiple data sources.
The rapid expansion of social network provides a suitable platform for users to deliver messages. Through the social network, we can harvest resources and share messages in a very short time. The developing of social network has brought us tremendous conveniences. However, nodes that make up the network have different spreading capability, which are constrained by many factors, and the topological structure of network is the principal element. In order to calculate the importance of nodes in network more accurately, this paper defines the improved H-index centrality (IH) according to the diversity of neighboring nodes, then uses the cumulative centrality (MC) to take all neighboring nodes into consideration, and proposes the extended mixing H-index centrality (EMH). We evaluate the proposed method by Susceptible-Infected-Recovered (SIR) model and monotonicity which are used to assess accuracy and resolution of the method, respectively. Experimental results indicate that the proposed method is superior to the existing measures of identifying nodes in different networks.
We study the effectiveness of using multiple phases for maximizing the extent of information diffusion through a social network, and present insights while considering various aspects. In particular, we focus on the independent cascade model with the possibility of adaptively selecting seed nodes in multiple phases based on the observed diffusion in preceding phases, and conduct a detailed simulation study on real-world network datasets and various values of seeding budgets. We first present a negative result that more phases do not guarantee a better spread, however the adaptability advantage of more phases generally leads to a better spread in practice, as observed on real-world datasets. We study how diffusing in multiple phases affects the mean and standard deviation of the distribution representing the extent of diffusion. We then study how the number of phases impacts the effectiveness of multiphase diffusion, how the diffusion progresses phase-by-phase, and what is an optimal way to split the total seeding budget across phases. Our experiments suggest a significant gain when we move from single phase to two phases, and an appreciable gain when we further move to three phases, but the marginal gain thereafter is usually not very significant. Our main conclusion is that, given the number of phases, an optimal way to split the budget across phases is such that the number of nodes influenced in each phase is almost the same.