Do you want to publish a course? Click here

I/O-Efficient Generation of Massive Graphs Following the LFR Benchmark

76   0   0.0 ( 0 )
 Added by Manuel Penschuck
 Publication date 2016
and research's language is English




Ask ChatGPT about the research

LFR is a popular benchmark graph generator used to evaluate community detection algorithms. We present EM-LFR, the first external memory algorithm able to generate massive complex networks following the LFR benchmark. Its most expensive component is the generation of random graphs with prescribed degree sequences which can be divided into two steps: the graphs are first materialized deterministically using the Havel-Hakimi algorithm, and then randomized. Our main contributions are EM-HH and EM-ES, two I/O-efficient external memory algorithms for these two steps. We also propose EM-CM/ES, an alternative sampling scheme using the Configuration Model and rewiring steps to obtain a random simple graph. In an experimental evaluation we demonstrate their performance; our implementation is able to handle graphs with more than 37 billion edges on a single machine, is competitive with a massive parallel distributed algorithm, and is faster than a state-of-the-art internal memory implementation even on instances fitting in main memory. EM-LFRs implementation is capable of generating large graph instances orders of magnitude faster than the original implementation. We give evidence that both implementations yield graphs with matching properties by applying clustering algorithms to generated instances. Similarly, we analyse the evolution of graph properties as EM-ES is executed on networks obtained with EM-CM/ES and find that the alternative approach can accelerate the sampling process.

rate research

Read More

Graph randomisation is a crucial task in the analysis and synthesis of networks. It is typically implemented as an edge switching process (ESMC) repeatedly swapping the nodes of random edge pairs while maintaining the degrees involved. Curveball is a novel approach that instead considers the whole neighbourhoods of randomly drawn node pairs. Its Markov chain converges to a uniform distribution, and experiments suggest that it requires less steps than the established ESMC. Since trades however are more expensive, we study Curveballs practical runtime by introducing the first efficient Curveball algorithms: the I/O-efficient EM-CB for simple undirected graphs and its internal memory pendant IM-CB. Further, we investigate global trades processing every node in a graph during a single super step, and show that undirected global trades converge to a uniform distribution and perform superior in practice. We then discuss EM-GCB and EM-PGCB for global trades and give experimental evidence that EM-PGCB achieves the quality of the state-of-the-art ESMC algorithm EM-ES nearly one order of magnitude faster.
163 - Saachi Jain , Matei Zaharia 2019
We consider the problem of finding lower bounds on the I/O complexity of arbitrary computations in a two level memory hierarchy. Executions of complex computations can be formalized as an evaluation order over the underlying computation graph. However, prior methods for finding I/O lower bounds leverage the graph structures for specific problems (e.g matrix multiplication) which cannot be applied to arbitrary graphs. In this paper, we first present a novel method to bound the I/O of any computation graph using the first few eigenvalues of the graphs Laplacian. We further extend this bound to the parallel setting. This spectral bound is not only efficiently computable by power iteration, but can also be computed in closed form for graphs with known spectra. We apply our spectral method to compute closed-form analytical bounds on two computation graphs (the Bellman-Held-Karp algorithm for the traveling salesman problem and the Fast Fourier Transform), as well as provide a probabilistic bound for random Erdos Renyi graphs. We empirically validate our bound on four computation graphs, and find that our method provides tighter bounds than current empirical methods and behaves similarly to previously published I/O bounds.
More and more massive parallel codes running on several hundreds of thousands of cores enter the computational science and engineering domain, allowing high-fidelity computations on up to trillions of unknowns for very detailed analyses of the underlying problems. During such runs, typically gigabytes of data are being produced, hindering both efficient storage and (interactive) data exploration. Here, advanced approaches based on inherently distributed data formats such as HDF5 become necessary in order to avoid long latencies when storing the data and to support fast (random) access when retrieving the data for visual processing. Avoiding file locking and using collective buffering, write bandwidths to a single file close to the theoretical peak on a modern supercomputing cluster were achieved. The structure of the output file supports a very fast interactive visualisation and introduces additional steering functionality.
434 - Lelia Blin 2007
Self-stabilization is a versatile technique to withstand any transient fault in a distributed system. Mobile robots (or agents) are one of the emerging trends in distributed computing as they mimic autonomous biologic entities. The contribution of this paper is threefold. First, we present a new model for studying mobile entities in networks subject to transient faults. Our model differs from the classical robot model because robots have constraints about the paths they are allowed to follow, and from the classical agent model because the number of agents remains fixed throughout the execution of the protocol. Second, in this model, we study the possibility of designing self-stabilizing algorithms when those algorithms are run by mobile robots (or agents) evolving on a graph. We concentrate on the core building blocks of robot and agents problems: naming and leader election. Not surprisingly, when no constraints are given on the network graph topology and local execution model, both problems are impossible to solve. Finally, using minimal hypothesis with respect to impossibility results, we provide deterministic and probabilistic solutions to both problems, and show equivalence of these problems by an algorithmic reduction mechanism.
We investigate graph problems in the following setting: we are given a graph $G$ and we are required to solve a problem on $G^2$. While we focus mostly on exploring this theme in the distributed CONGEST model, we show new results and surprising connections to the centralized model of computation. In the CONGEST model, it is natural to expect that problems on $G^2$ would be quite difficult to solve efficiently on $G$, due to congestion. However, we show that the picture is both more complicated and more interesting. Specifically, we encounter two phenomena acting in opposing directions: (i) slowdown due to congestion and (ii) speedup due to structural properties of $G^2$. We demonstrate these two phenomena via two fundamental graph problems, namely, Minimum Vertex Cover (MVC) and Minimum Dominating Set (MDS). Among our many contributions, the highlights are the following. - In the CONGEST model, we show an $O(n/epsilon)$-round $(1+epsilon)$-approximation algorithm for MVC on $G^2$, while no $o(n^2)$-round algorithm is known for any better-than-2 approximation for MVC on $G$. - We show a centralized polynomial time $5/3$-approximation algorithm for MVC on $G^2$, whereas a better-than-2 approximation is UGC-hard for $G$. - In contrast, for MDS, in the CONGEST model, we show an $tilde{Omega}(n^2)$ lower bound for a constant approximation factor for MDS on $G^2$, whereas an $Omega(n^2)$ lower bound for MDS on $G$ is known only for exact computation. In addition to these highlighted results, we prove a number of other results in the distributed CONGEST model including an $tilde{Omega}(n^2)$ lower bound for computing an exact solution to MVC on $G^2$, a conditional hardness result for obtaining a $(1+epsilon)$-approximation to MVC on $G^2$, and an $O(log Delta)$-approximation to the MDS problem on $G^2$ in $mbox{poly}log n$ rounds.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا