A Fast Sampling Method of Exploring Graphlet Degrees of Large Directed and Undirected Graphs

69 0 0.0 ( 0 )

Download Cite

Added by Junzhou Zhao

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Pinghui Wang - Xiangliang Zhang - Zhenguo Li

Social and Information Networks

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Exploring small connected and induced subgraph patterns (CIS patterns, or graphlets) has recently attracted considerable attention. Despite recent efforts on computing the number of instances a specific graphlet appears in a large graph (i.e., the total number of CISes isomorphic to the graphlet), little attention has been paid to characterizing a nodes graphlet degree, i.e., the number of CISes isomorphic to the graphlet that include the node, which is an important metric for analyzing complex networks such as social and biological networks. Similar to global graphlet counting, it is challenging to compute node graphlet degrees for a large graph due to the combinatorial nature of the problem. Unfortunately, previous methods of computing global graphlet counts are not suited to solve this problem. In this paper we propose sampling methods to estimate node graphlet degrees for undirected and directed graphs, and analyze the error of our estimates. To the best of our knowledge, we are the first to study this problem and give a fast scalable solution. We conduct experiments on a variety of real-word datasets that demonstrate that our methods accurately and efficiently estimate node graphlet degrees for graphs with millions of edges.

rate research

Proximity and Remoteness in Directed and Undirected Graphs

139 - Jiangdong Ai , Stefanie Gerke , Gregory Gutin 2020

Let $D$ be a strongly connected digraph. The average distance $bar{sigma}(v)$ of a vertex $v$ of $D$ is the arithmetic mean of the distances from $v$ to all other vertices of $D$. The remoteness $rho(D)$ and proximity $pi(D)$ of $D$ are the maximum and the minimum of the average distances of the vertices of $D$, respectively. We obtain sharp upper and lower bounds on $pi(D)$ and $rho(D)$ as a function of the order $n$ of $D$ and describe the extreme digraphs for all the bounds. We also obtain such bounds for strong tournaments. We show that for a strong tournament $T$, we have $pi(T)=rho(T)$ if and only if $T$ is regular. Due to this result, one may conjecture that every strong digraph $D$ with $pi(D)=rho(D)$ is regular. We present an infinite family of non-regular strong digraphs $D$ such that $pi(D)=rho(D).$ We describe such a family for undirected graphs as well.

Combinatorics Discrete Mathematics

Characterizing Directed and Undirected Networks via Multidimensional Walks with Jumps

57 - Fabricio Murai , Bruno Ribeiro , Don Towsley 2017

Estimating distributions of node characteristics (labels) such as number of connections or citizenship of users in a social network via edge and node sampling is a vital part of the study of complex networks. Due to its low cost, sampling via a random walk (RW) has been proposed as an attractive solution to this task. Most RW methods assume either that the network is undirected or that walkers can traverse edges regardless of their direction. Some RW methods have been designed for directed networks where edges coming into a node are not directly observable. In this work, we propose Directed Unbiased Frontier Sampling (DUFS), a sampling method based on a large number of coordinated walkers, each starting from a node chosen uniformly at random. It is applicable to directed networks with invisible incoming edges because it constructs, in real-time, an undirected graph consistent with the walkers trajectories, and due to the use of random jumps which prevent walkers from being trapped. DUFS generalizes previous RW methods and is suited for undirected networks and to directed networks regardless of in-edges visibility. We also propose an improved estimator of node label distributions that combines information from the initial walker locations with subsequent RW observations. We evaluate DUFS, compare it to other RW methods, investigate the impact of its parameters on estimation accuracy and provide practical guidelines for choosing them. In estimating out-degree distributions, DUFS yields significantly better estimates of the head of the distribution than other methods, while matching or exceeding estimation accuracy of the tail. Last, we show that DUFS outperforms uniform node sampling when estimating distributions of node labels of the top 10% largest degree nodes, even when sampling a node uniformly has the same cost as RW steps.

Social and Information Networks Physics and Society

Directed Random Geometric Graphs

71 - Jesse Michel , Sushruth Reddy , Rikhav Shah 2018

Many real-world networks are intrinsically directed. Such networks include activation of genes, hyperlinks on the internet, and the network of followers on Twitter among many others. The challenge, however, is to create a network model that has many of the properties of real-world networks such as powerlaw degree distributions and the small-world property. To meet these challenges, we introduce the textit{Directed} Random Geometric Graph (DRGG) model, which is an extension of the random geometric graph model. We prove that it is scale-free with respect to the indegree distribution, has binomial outdegree distribution, has a high clustering coefficient, has few edges and is likely small-world. These are some of the main features of aforementioned real world networks. We empirically observe that word association networks have many of the theoretical properties of the DRGG model.

Social and Information Networks Disordered Systems and Neural Networks Physics and Society

Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs

617 - C. Seshadhri , Ali Pinar , Tamara G. Kolda 2013

Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderately-sized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimation local clustering coefficients, degree-wise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than state-of-the-art alternatives.

Social and Information Networks Data Structures and Algorithms

A metric on directed graphs and Markov chains based on hitting probabilities

72 - Zachary M. Boyd , Nicolas Fraiman , Jeremy L. Marzuola 2020

The shortest-path, commute time, and diffusion distances on undirected graphs have been widely employed in applications such as dimensionality reduction, link prediction, and trip planning. Increasingly, there is interest in using asymmetric structure of data derived from Markov chains and directed graphs, but few metrics are specifically adapted to this task. We introduce a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain and, in particular, on any Markov chain derived from a directed graph. Our construction is based on hitting probabilities, with nearness in the metric space related to the transfer of random walkers from one node to another at stationarity. Notably, our metric is insensitive to shortest and average walk distances, thus giving new information compared to existing metrics. We use possible degeneracies in the metric to develop an interesting structural theory of directed graphs and explore a related quotienting procedure. Our metric can be computed in $O(n^3)$ time, where $n$ is the number of states, and in examples we scale up to $n=10,000$ nodes and $approx 38M$ edges on a desktop computer. In several examples, we explore the nature of the metric, compare it to alternative methods, and demonstrate its utility for weak recovery of community structure in dense graphs, visualization, structure recovering, dynamics exploration, and multiscale cluster detection.

Social and Information Networks Machine Learning Numerical Analysis