No Arabic abstract
Influence maximization, defined as a problem of finding a set of seed nodes to trigger a maximized spread of influence, is crucial to viral marketing on social networks. For practical viral marketing on large scale social networks, it is required that influence maximization algorithms should have both guaranteed accuracy and high scalability. However, existing algorithms suffer a scalability-accuracy dilemma: conventional greedy algorithms guarantee the accuracy with expensive computation, while the scalable heuristic algorithms suffer from unstable accuracy. In this paper, we focus on solving this scalability-accuracy dilemma. We point out that the essential reason of the dilemma is the surprising fact that the submodularity, a key requirement of the objective function for a greedy algorithm to approximate the optimum, is not guaranteed in all conventional greedy algorithms in the literature of influence maximization. Therefore a greedy algorithm has to afford a huge number of Monte Carlo simulations to reduce the pain caused by unguaranteed submodularity. Motivated by this critical finding, we propose a static greedy algorithm, named StaticGreedy, to strictly guarantee the submodularity of influence spread function during the seed selection process. The proposed algorithm makes the computational expense dramatically reduced by two orders of magnitude without loss of accuracy. Moreover, we propose a dynamical update strategy which can speed up the StaticGreedy algorithm by 2-7 times on large scale social networks.
Social networks have been popular platforms for information propagation. An important use case is viral marketing: given a promotion budget, an advertiser can choose some influential users as the seed set and provide them free or discounted sample products; in this way, the advertiser hopes to increase the popularity of the product in the users friend circles by the world-of-mouth effect, and thus maximizes the number of users that information of the production can reach. There has been a body of literature studying the influence maximization problem. Nevertheless, the existing studies mostly investigate the problem on a one-off basis, assuming fixed known influence probabilities among users, or the knowledge of the exact social network topology. In practice, the social network topology and the influence probabilities are typically unknown to the advertiser, which can be varying over time, i.e., in cases of newly established, strengthened or weakened social ties. In this paper, we focus on a dynamic non-stationary social network and design a randomized algorithm, RSB, based on multi-armed bandit optimization, to maximize influence propagation over time. The algorithm produces a sequence of online decisions and calibrates its explore-exploit strategy utilizing outcomes of previous decisions. It is rigorously proven to achieve an upper-bounded regret in reward and applicable to large-scale social networks. Practical effectiveness of the algorithm is evaluated using both synthetic and real-world datasets, which demonstrates that our algorithm outperforms previous stationary methods under non-stationary conditions.
We consider the problem of maximizing the spread of influence in a social network by choosing a fixed number of initial seeds, formally referred to as the influence maximization problem. It admits a $(1-1/e)$-factor approximation algorithm if the influence function is submodular. Otherwise, in the worst case, the problem is NP-hard to approximate to within a factor of $N^{1-varepsilon}$. This paper studies whether this worst-case hardness result can be circumvented by making assumptions about either the underlying network topology or the cascade model. All of our assumptions are motivated by many real life social network cascades. First, we present strong inapproximability results for a very restricted class of networks called the (stochastic) hierarchical blockmodel, a special case of the well-studied (stochastic) blockmodel in which relationships between blocks admit a tree structure. We also provide a dynamic-program based polynomial time algorithm which optimally computes a directed variant of the influence maximization problem on hierarchical blockmodel networks. Our algorithm indicates that the inapproximability result is due to the bidirectionality of influence between agent-blocks. Second, we present strong inapproximability results for a class of influence functions that are almost submodular, called 2-quasi-submodular. Our inapproximability results hold even for any 2-quasi-submodular $f$ fixed in advance. This result also indicates that the threshold between submodularity and nonsubmodularity is sharp, regarding the approximability of influence maximization.
Influence Maximization is a NP-hard problem of selecting the optimal set of influencers in a network. Here, we propose two new approaches to influence maximization based on two very different metrics. The first metric, termed Balanced Index (BI), is fast to compute and assigns top values to two kinds of nodes: those with high resistance to adoption, and those with large out-degree. This is done by linearly combining three properties of a node: its degree, susceptibility to new opinions, and the impact its activation will have on its neighborhood. Controlling the weights between those three terms has a huge impact on performance. The second metric, termed Group Performance Index (GPI), measures performance of each node as an initiator when it is a part of randomly selected initiator set. In each such selection, the score assigned to each teammate is inversely proportional to the number of initiators causing the desired spread. These two metrics are applicable to various cascade models; here we test them on the Linear Threshold Model with fixed and known thresholds. Furthermore, we study the impact of network degree assortativity and threshold distribution on the cascade size for metrics including ours. The results demonstrate our two metrics deliver strong performance for influence maximization.
The steady growth of graph data from social networks has resulted in wide-spread research in finding solutions to the influence maximization problem. In this paper, we propose a holistic solution to the influence maximization (IM) problem. (1) We introduce an opinion-cum-interaction (OI) model that closely mirrors the real-world scenarios. Under the OI model, we introduce a novel problem of Maximizing the Effective Opinion (MEO) of influenced users. We prove that the MEO problem is NP-hard and cannot be approximated within a constant ratio unless P=NP. (2) We propose a heuristic algorithm OSIM to efficiently solve the MEO problem. To better explain the OSIM heuristic, we first introduce EaSyIM - the opinion-oblivious version of OSIM, a scalable algorithm capable of running within practical compute times on commodity hardware. In addition to serving as a fundamental building block for OSIM, EaSyIM is capable of addressing the scalability aspect - memory consumption and running time, of the IM problem as well. Empirically, our algorithms are capable of maintaining the deviation in the spread always within 5% of the best known methods in the literature. In addition, our experiments show that both OSIM and EaSyIM are effective, efficient, scalable and significantly enhance the ability to analyze real datasets.
Influence maximization, fundamental for word-of-mouth marketing and viral marketing, aims to find a set of seed nodes maximizing influence spread on social network. Early methods mainly fall into two paradigms with certain benefits and drawbacks: (1)Greedy algorithms, selecting seed nodes one by one, give a guaranteed accuracy relying on the accurate approximation of influence spread with high computational cost; (2)Heuristic algorithms, estimating influence spread using efficient heuristics, have low computational cost but unstable accuracy. We first point out that greedy algorithms are essentially finding a self-consistent ranking, where nodes ranks are consistent with their ranking-based marginal influence spread. This insight motivates us to develop an iterative ranking framework, i.e., IMRank, to efficiently solve influence maximization problem under independent cascade model. Starting from an initial ranking, e.g., one obtained from efficient heuristic algorithm, IMRank finds a self-consistent ranking by reordering nodes iteratively in terms of their ranking-based marginal influence spread computed according to current ranking. We also prove that IMRank definitely converges to a self-consistent ranking starting from any initial ranking. Furthermore, within this framework, a last-to-first allocating strategy and a generalization of this strategy are proposed to improve the efficiency of estimating ranking-based marginal influence spread for a given ranking. In this way, IMRank achieves both remarkable efficiency and high accuracy by leveraging simultaneously the benefits of greedy algorithms and heuristic algorithms. As demonstrated by extensive experiments on large scale real-world social networks, IMRank always achieves high accuracy comparable to greedy algorithms, with computational cost reduced dramatically, even about $10-100$ times faster than other scalable heuristics.