Identifying Infection Sources and Regions in Large Networks


Abstract in English

Identifying the infection sources in a network, including the index cases that introduce a contagious disease into a population network, the servers that inject a computer virus into a computer network, or the individuals who started a rumor in a social network, plays a critical role in limiting the damage caused by the infection through timely quarantine of the sources. We consider the problem of estimating the infection sources and the infection regions (subsets of nodes infected by each source) in a network, based only on knowledge of which nodes are infected and their connections, and when the number of sources is unknown a priori. We derive estimators for the infection sources and their infection regions based on approximations of the infection sequences count. We prove that if there are at most two infection sources in a geometric tree, our estimator identifies the true source or sources with probability going to one as the number of infected nodes increases. When there are more than two infection sources, and when the maximum possible number of infection sources is known, we propose an algorithm with quadratic complexity to estimate the actual number and identities of the infection sources. Simulations on various kinds of networks, including tree networks, small-world networks and real world power grid networks, and tests on two real data sets are provided to verify the performance of our estimators.

Download