An Efficient and Balanced Graph Partition Algorithm for the Subgraph-Centric Programming Model on Large-scale Power-law Graphs

104 0 0.0 ( 0 )

Download Cite

Added by Shuai Zhang

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Shuai Zhang - Zite Jiang - Xingzhong Hou

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The subgraph-centric programming model is a promising approach and has been applied in many state-of-the-art distributed graph computing frameworks. However, traditional graph partition algorithms have significant difficulties in processing large-scale power-law graphs. The major problem is the communication bottleneck found in many subgraph-centric frameworks. Detailed analysis indicates that the communication bottleneck is caused by the huge communication volume or the extreme message imbalance among partitioned subgraphs. The traditional partition algorithms do not consider both factors at the same time, especially on power-law graphs. In this paper, we propose a novel efficient and balanced vertex-cut graph partition algorithm (EBV) which grants appropriate weights to the overall communication cost and communication balance. We observe that the number of replicated vertices and the balance of edge and vertex assignment have a great influence on communication patterns of distributed subgraph-centric frameworks, which further affect the overall performance. Based on this insight, We design an evaluation function that quantifies the proportion of replicated vertices and the balance of edges and vertices assignments as important parameters. Besides, we sort the order of edge processing by the sum of end-vertices degrees from small to large. Experiments show that EBV reduces replication factor and communication by at least 21.8% and 23.7% respectively than other self-based partition algorithms. When deployed in the subgraph-centric framework, it reduces the running time on power-law graphs by an average of 16.8% compared with the state-of-the-art partition algorithm. Our results indicate that EBV has a great potential in improving the performance of subgraph-centric frameworks for the parallel large-scale power-law graph processing.

rate research

DRGraph: An Efficient Graph Layout Algorithm for Large-scale Graphs by Dimensionality Reduction

75 - Minfeng Zhu , Wei Chen , Yuanzhe Hu 2020

Efficient layout of large-scale graphs remains a challenging problem: the force-directed and dimensionality reduction-based methods suffer from high overhead for graph distance and gradient computation. In this paper, we present a new graph layout algorithm, called DRGraph, that enhances the nonlinear dimensionality reduction process with three schemes: approximating graph distances by means of a sparse distance matrix, estimating the gradient by using the negative sampling technique, and accelerating the optimization process through a multi-level layout scheme. DRGraph achieves a linear complexity for the computation and memory consumption, and scales up to large-scale graphs with millions of nodes. Experimental results and comparisons with state-of-the-art graph layout methods demonstrate that DRGraph can generate visually comparable layouts with a faster running time and a lower memory requirement.

Social and Information Networks

An Efficient Quadratic Programming Relaxation Based Algorithm for Large-Scale MIMO Detection

166 - Ping-Fan Zhao , Qing-Na Li , Wei-Kun Chen 2020

Multiple-input multiple-output (MIMO) detection is a fundamental problem in wireless communications and it is strongly NP-hard in general. Massive MIMO has been recognized as a key technology in the fifth generation (5G) and beyond communication networks, which on one hand can significantly improve the communication performance, and on the other hand poses new challenges of solving the corresponding optimization problems due to the large problem size. While various efficient algorithms such as semidefinite relaxation (SDR) based approaches have been proposed for solving the small-scale MIMO detection problem, they are not suitable to solve the large-scale MIMO detection problem due to their high computational complexities. In this paper, we propose an efficient sparse quadratic programming (SQP) relaxation based algorithm for solving the large-scale MIMO detection problem. In particular, we first reformulate the MIMO detection problem as an SQP problem. By dropping the sparse constraint, the resulting relaxation problem shares the same global minimizer with the SQP problem. In sharp contrast to the SDRs for the MIMO detection problem, our relaxation does not contain any (positive semidefinite) matrix variable and the numbers of variables and constraints in our relaxation are significantly less than those in the SDRs, which makes it particularly suitable for the large-scale problem. Then we propose a projected Newton based quadratic penalty method to solve the relaxation problem, which is guaranteed to converge to the vector of transmitted signals under reasonable conditions. By extensive numerical experiments, when applied to solve large-scale problems, the proposed algorithm achieves better detection performance than a recently proposed generalized power method.

Optimization and Control Information Theory Signal Processing

Practical Evaluation of the Lasp Programming Model at Large Scale - An Experience Report

106 - Christopher S. Meiklejohn , Vitor Enes , Junghun Yoo 2017

Programming models for building large-scale distributed applications assist the developer in reasoning about consistency and distribution. However, many of the programming models for weak consistency, which promise the largest scalability gains, have little in the way of evaluation to demonstrate the promised scalability. We present an experience report on the implementation and large-scale evaluation of one of these models, Lasp, originally presented at PPDP `15, which provides a declarative, functional programming style for distributed applications. We demonstrate the scalability of Lasps prototype runtime implementation up to 1024 nodes in the Amazon cloud computing environment. It achieves high scalability by uniquely combining hybrid gossip with a programming model based on convergent computation. We report on the engineering challenges of this implementation and its evaluation, specifically related to operating research prototypes in a production cloud environment.

Distributed Parallel and Cluster Computing

Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

396 - Ye Yuan , Guoren Wang , Lei Chen 2012

Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.

Databases

A Graph Computation based Sequential Power Flow Calculation for Large-Scale ACDC Systems

310 - Wei Feng , Jingjin Wu , Chen Yuan 2019

This paper proposes a graph computation based sequential power flow calculation method for Line Commutated Converter (LCC) based large-scale AC/DC systems to achieve a high computing performance. Based on the graph theory, the complex AC/DC system is first converted to a graph model and stored in a graph database. Then, the hybrid system is divided into several isolated areas with graph partition algorithm by decoupling AC and DC networks. Thus, the power flow analysis can be executed in parallel for each independent area with the new selected slack buses. Furthermore, for each area, the node-based parallel computing (NPC) and hierarchical parallel computing (HPC) used in graph computation are employed to speed up fast decoupled power flow (FDPF). Comprehensive case studies on the IEEE 300-bus, polished South Carolina 12,000-bus system and a China 11,119-bus system are performed to demonstrate the accuracy and efficiency of the proposed method

Distributed Parallel and Cluster Computing