Efficient Scaling of Dynamic Graph Neural Networks

110 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Venkatesan Chakaravarthy

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Venkatesan T. Chakaravarthy - Shivmaran S. Pandian - Saurabh Raje

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present distributed algorithms for training dynamic Graph Neural Networks (GNN) on large scale graphs spanning multi-node, multi-GPU systems. To the best of our knowledge, this is the first scaling study on dynamic GNN. We devise mechanisms for reducing the GPU memory usage and identify two execution time bottlenecks: CPU-GPU data transfer; and communication volume. Exploiting properties of dynamic graphs, we design a graph difference-based strategy to significantly reduce the transfer time. We develop a simple, but effective data distribution technique under which the communication volume remains fixed and linear in the input size, for any number of GPUs. Our experiments using billion-size graphs on a system of 128 GPUs shows that: (i) the distribution scheme achieves up to 30x speedup on 128 GPUs; (ii) the graph-difference technique reduces the transfer time by a factor of up to 4.1x and the overall execution time by up to 40%

قيم البحث

162 - Masatoshi Hanai , Nikos Tziritas , Toyotaro Suzumura 2021

The dynamic scaling of distributed computations plays an important role in the utilization of elastic computational resources, such as the cloud. It enables the provisioning and de-provisioning of resources to match dynamic resource availability and demands. In the case of distributed graph processing, changing the number of the graph partitions while maintaining high partitioning quality imposes serious computational overheads as typically a time-consuming graph partitioning algorithm needs to execute each time repartitioning is required. In this paper, we propose a dynamic scaling method that can efficiently change the number of graph partitions while keeping its quality high. Our idea is based on two techniques: preprocessing and very fast edge partitioning, called graph edge ordering and chunk-based edge partitioning, respectively. The former converts the graph data into an ordered edge list in such a way that edges with high locality are closer to each other. The latter immediately divides the ordered edge list into an arbitrary number of high-quality partitions. The evaluation with the real-world billion-scale graphs demonstrates that our proposed approach significantly reduces the repartitioning time, while the partitioning quality it achieves is on par with that of the best existing static method.

النظم الموزعة والتوازية والحوسبة العنقودية قواعد البيانات الرياضيات المتقطعة

Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

83 - Dominik Scheinert , Houkun Zhu , Lauritz Thamsen 2021

Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. While runtime prediction models can be used to initially select appropriate cluster resources given target runtimes, the actual runtime performa nce of dataflow jobs depends on several factors and varies over time. Yet, in many situations, dynamic scaling can be used to meet formulated runtime targets despite significant performance variance. This paper presents Enel, a novel dynamic scaling approach that uses message propagation on an attributed graph to model dataflow jobs and, thus, allows for deriving effective rescaling decisions. For this, Enel incorporates descriptive properties that capture the respective execution context, considers statistics from individual dataflow tasks, and propagates predictions through the job graph to eventually find an optimized new scale-out. Our evaluation of Enel with four iterative Spark jobs shows that our approach is able to identify effective rescaling actions, reacting for instance to node failures, and can be reused across different execution contexts.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Efficient Probabilistic Logic Reasoning with Graph Neural Networks

471 - Yuyu Zhang , Xinshi Chen , Yuan Yang 2020

Markov Logic Networks (MLNs), which elegantly combine logic rules and probabilistic graphical models, can be used to address many knowledge graph problems. However, inference in MLN is computationally intensive, making the industrial-scale applicatio n of MLN very difficult. In recent years, graph neural networks (GNNs) have emerged as efficient and effective tools for large-scale graph problems. Nevertheless, GNNs do not explicitly incorporate prior logic rules into the models, and may require many labeled examples for a target task. In this paper, we explore the combination of MLNs and GNNs, and use graph neural networks for variational inference in MLN. We propose a GNN variant, named ExpressGNN, which strikes a nice balance between the representation power and the simplicity of the model. Our extensive experiments on several benchmark datasets demonstrate that ExpressGNN leads to effective and efficient probabilistic logic reasoning.

الذكاء الاصطناعي التعلم الآلي

Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks

134 - Zhishuai Guo , Mingrui Liu , Zhuoning Yuan 2020

In this paper, we study distributed algorithms for large-scale AUC maximization with a deep neural network as a predictive model. Although distributed learning techniques have been investigated extensively in deep learning, they are not directly appl icable to stochastic AUC maximization with deep neural networks due to its striking differences from standard loss minimization problems (e.g., cross-entropy). Towards addressing this challenge, we propose and analyze a communication-efficient distributed optimization algorithm based on a {it non-convex concave} reformulation of the AUC maximization, in which the communication of both the primal variable and the dual variable between each worker and the parameter server only occurs after multiple steps of gradient-based updates in each worker. Compared with the naive parallel version of an existing algorithm that computes stochastic gradients at individual machines and averages them for updating the model parameters, our algorithm requires a much less number of communication rounds and still achieves a linear speedup in theory. To the best of our knowledge, this is the textbf{first} work that solves the {it non-convex concave min-max} problem for AUC maximization with deep neural networks in a communication-efficient distributed manner while still maintaining the linear speedup property in theory. Our experiments on several benchmark datasets show the effectiveness of our algorithm and also confirm our theory.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي التحسين والتحكم

Towards Efficient Point Cloud Graph Neural Networks Through Architectural Simplification

139 - Shyam A. Tailor , Rene de Jong , Tiago Azevedo 2021

In recent years graph neural network (GNN)-based approaches have become a popular strategy for processing point cloud data, regularly achieving state-of-the-art performance on a variety of tasks. To date, the research community has primarily focused on improving model expressiveness, with secondary thought given to how to design models that can run efficiently on resource constrained mobile devices including smartphones or mixed reality headsets. In this work we make a step towards improving the efficiency of these models by making the observation that these GNN models are heavily limited by the representational power of their first, feature extracting, layer. We find that it is possible to radically simplify these models so long as the feature extraction layer is retained with minimal degradation to model performance; further, we discover that it is possible to improve performance overall on ModelNet40 and S3DIS by improving the design of the feature extractor. Our approach reduces memory consumption by 20$times$ and latency by up to 9.9$times$ for graph layers in models such as DGCNN; overall, we achieve speed-ups of up to 4.5$times$ and peak memory reductions of 72.5%.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي