ترغب بنشر مسار تعليمي؟ اضغط هنا

Accurately Modeling Biased Random Walks on Weighted Graphs Using $textit{Node2vec+}$

186   0   0.0 ( 0 )
 نشر من قبل Renming Liu
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Node embedding is a powerful approach for representing the structural role of each node in a graph. $textit{Node2vec}$ is a widely used method for node embedding that works by exploring the local neighborhoods via biased random walks on the graph. However, $textit{node2vec}$ does not consider edge weights when computing walk biases. This intrinsic limitation prevents $textit{node2vec}$ from leveraging all the information in weighted graphs and, in turn, limits its application to many real-world networks that are weighted and dense. Here, we naturally extend $textit{node2vec}$ to $textit{node2vec+}$ in a way that accounts for edge weights when calculating walk biases, but which reduces to $textit{node2vec}$ in the cases of unweighted graphs or unbiased walks. We empirically show that $textit{node2vec+}$ is more robust to additive noise than $textit{node2vec}$ in weighted graphs using two synthetic datasets. We also demonstrate that $textit{node2vec+}$ significantly outperforms $textit{node2vec}$ on a commonly benchmarked multi-label dataset (Wikipedia). Furthermore, we test $textit{node2vec+}$ against GCN and GraphSAGE using various challenging gene classification tasks on two protein-protein interaction networks. Despite some clear advantages of GCN and GraphSAGE, they show comparable performance with $textit{node2vec+}$. Finally, $textit{node2vec+}$ can be used as a general approach for generating biased random walks, benefiting all existing methods built on top of $textit{node2vec}$. $textit{Node2vec+}$ is implemented as part of $texttt{PecanPy}$, which is available at https://github.com/krishnanlab/PecanPy .



قيم البحث

اقرأ أيضاً

273 - Lingqi Meng , Naoki Masuda 2020
Random walks have been proven to be useful for constructing various algorithms to gain information on networks. Algorithm node2vec employs biased random walks to realize embeddings of nodes into low-dimensional spaces, which can then be used for task s such as multi-label classification and link prediction. The usefulness of node2vec in these applications is considered to be contingent upon properties of random walks that the node2vec algorithm uses. In the present study, we theoretically and numerically analyze random walks used by the node2vec. The node2vec random walk is a second-order Markov chain. We exploit the mapping of its transition rule to a transition probability matrix among directed edges to analyze the stationary probability, relaxation times, and coalescence time. In particular, we provide a multitude of evidence that node2vec random walk accelerates diffusion when its parameters are tuned such that walkers avoid both back-tracking and visiting a neighbor of the previously visited node, but not excessively.
We develop rigorous, analytic techniques to study the behaviour of biased random walks on combs. This enables us to calculate exactly the spectral dimension of random comb ensembles for any bias scenario in the teeth or spine. Two specific examples o f random comb ensembles are discussed; the random comb with nonzero probability of an infinitely long tooth at each vertex on the spine and the random comb with a power law distribution of tooth lengths. We also analyze transport properties along the spine for these probability measures.
In recent years, protocols that are based on the properties of random walks on graphs have found many applications in communication and information networks, such as wireless networks, peer-to-peer networks and the Web. For wireless networks (and oth er networks), graphs are actually not the correct model of the communication; instead hyper-graphs better capture the communication over a wireless shared channel. Motivated by this example, we study in this paper random walks on hyper-graphs. First, we formalize the random walk process on hyper-graphs and generalize key notions from random walks on graphs. We then give the novel definition of radio cover time, namely, the expected time of a random walk to be heard (as opposed to visit) by all nodes. We then provide some basic bounds on the radio cover, in particular, we show that while on graphs the radio cover time is O(mn), in hyper-graphs it is O(mnr) where n, m and r are the number of nodes, the number of edges and the rank of the hyper-graph, respectively. In addition, we define radio hitting times and give a polynomial algorithm to compute them. We conclude the paper with results on specific hyper-graphs that model wireless networks in one and two dimensions.
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. By utilizing fast matrix block-approximation techniques, we propose an approximative framework to such non-trivial ERGs that r esult in dyadic independence (i.e., edge independent) distributions, while being able to meaningfully model both local information of the graph (e.g., degrees) as well as global information (e.g., clustering coefficient, assortativity, etc.) if desired. This allows one to efficiently generate random networks with similar properties as an observed network, and the models can be used for several downstream tasks such as link prediction. Our methods are scalable to sparse graphs consisting of millions of nodes. Empirical evaluation demonstrates competitiveness in terms of both speed and accuracy with state-of-the-art methods -- which are typically based on embedding the graph into some low-dimensional space -- for link prediction, showcasing the potential of a more direct and interpretable probabalistic model for this task.
In recent years, graph neural networks (GNNs) have gained increasing popularity and have shown very promising results for data that are represented by graphs. The majority of GNN architectures are designed based on developing new convolutional and/or pooling layers that better extract the hidden and deeper representations of the graphs to be used for different prediction tasks. The inputs to these layers are mainly the three default descriptors of a graph, node features $(X)$, adjacency matrix $(A)$, and edge features $(W)$ (if available). To provide a more enriched input to the network, we propose a random walk data processing of the graphs based on three selected lengths. Namely, (regular) walks of length 1 and 2, and a fractional walk of length $gamma in (0,1)$, in order to capture the different local and global dynamics on the graphs. We also calculate the stationary distribution of each random walk, which is then used as a scaling factor for the initial node features ($X$). This way, for each graph, the network receives multiple adjacency matrices along with their individual weighting for the node features. We test our method on various molecular datasets by passing the processed node features to the network in order to perform several classification and regression tasks. Interestingly, our method, not using edge features which are heavily exploited in molecular graph learning, let a shallow network outperform well known deep GNNs.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا