Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks

332 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ziniu Hu

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Difan Zou - Ziniu Hu - Yewen Wang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Graph convolutional networks (GCNs) have recently received wide attentions, due to their successful applications in different graph tasks and different domains. Training GCNs for a large graph, however, is still a challenge. Original full-batch GCN training requires calculating the representation of all the nodes in the graph per GCN layer, which brings in high computation and memory costs. To alleviate this issue, several sampling-based methods have been proposed to train GCNs on a subset of nodes. Among them, the node-wise neighbor-sampling method recursively samples a fixed number of neighbor nodes, and thus its computation cost suffers from exponential growing neighbor size; while the layer-wise importance-sampling method discards the neighbor-dependent constraints, and thus the nodes sampled across layer suffer from sparse connection problem. To deal with the above two problems, we propose a new effective sampling algorithm called LAyer-Dependent ImportancE Sampling (LADIES). Based on the sampled nodes in the upper layer, LADIES selects their neighborhood nodes, constructs a bipartite subgraph and computes the importance probability accordingly. Then, it samples a fixed number of nodes by the calculated probability, and recursively conducts such procedure per layer to construct the whole computation graph. We prove theoretically and experimentally, that our proposed sampling algorithm outperforms the previous sampling methods in terms of both time and memory costs. Furthermore, LADIES is shown to have better generalization accuracy than original full-batch GCN, due to its stochastic nature.

قيم البحث

198 - Peng Jiang , Masuma Akter Rumi 2021

Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data recursively from neighboring nodes. To reduce the computation overhead, previous works have proposed various neighbor sampling methods that estimate the aggregati on result based on a small number of sampled neighbors. Although these methods have successfully accelerated the training, they mainly focus on the single-machine setting. As real-world graphs are large, training GCNs in distributed systems is desirable. However, we found that the existing neighbor sampling methods do not work well in a distributed setting. Specifically, a naive implementation may incur a huge amount of communication of feature vectors among different machines. To address this problem, we propose a communication-efficient neighbor sampling method in this work. Our main idea is to assign higher sampling probabilities to the local nodes so that remote nodes are accessed less frequently. We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training. Our experiments with node classification benchmarks show that our method significantly reduces the communication overhead for distributed GCN training with little accuracy loss.

التعلم الآلي

GIST: Distributed Training for Large-Scale Graph Convolutional Networks

100 - Cameron R. Wolfe , Jingkang Yang , Arindam Chowdhury 2021

The graph convolutional network (GCN) is a go-to solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored training on l arge-scale graphs (e.g., GraphSAGE, ClusterGCN, etc.), we pioneer efficient training of large-scale GCN models (i.e., ultra-wide, overparameterized models) with the proposal of a novel, distributed training framework. Our proposed training methodology, called GIST, disjointly partitions the parameters of a GCN model into several, smaller sub-GCNs that are trained independently and in parallel. In addition to being compatible with any GCN architecture, GIST improves model performance, scales to training on arbitrarily large graphs, significantly decreases wall-clock training time, and enables the training of markedly overparameterized GCN models. Remarkably, with GIST, we train an astonishgly-wide 32,768-dimensional GraphSAGE model, which exceeds the capacity of a single GPU by a factor of 8X, to SOTA performance on the Amazon2M dataset.

التعلم الآلي الذكاء الاصطناعي النظم الموزعة والتوازية والحوسبة العنقودية

EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs

96 - Aldo Pareja , Giacomo Domeniconi , Jie Chen 2019

Graph representation learning resurges as a trending research subject owing to the widespread use of deep learning for Euclidean data, which inspire various creative designs of neural networks in the non-Euclidean domain, particularly graphs. With th e success of these graph neural networks (GNN) in the static setting, we approach further practical scenarios where the graph dynamically evolves. Existing approaches typically resort to node embeddings and use a recurrent neural network (RNN, broadly speaking) to regulate the embeddings and learn the temporal dynamics. These methods require the knowledge of a node in the full time span (including both training and testing) and are less applicable to the frequent change of the node set. In some extreme scenarios, the node sets at different time steps may completely differ. To resolve this challenge, we propose EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings. The proposed approach captures the dynamism of the graph sequence through using an RNN to evolve the GCN parameters. Two architectures are considered for the parameter evolution. We evaluate the proposed approach on tasks including link prediction, edge classification, and node classification. The experimental results indicate a generally higher performance of EvolveGCN compared with related approaches. The code is available at url{https://github.com/IBM/EvolveGCN}.

التعلم الآلي الشبكات الاجتماعية والمعلومات التعلم الالي

Sampling methods for efficient training of graph convolutional networks: A survey

116 - Xin Liu , Mingyu Yan , Lei Deng 2021

Graph Convolutional Networks (GCNs) have received significant attention from various research fields due to the excellent performance in learning graph representations. Although GCN performs well compared with other methods, it still faces challenges . Training a GCN model for large-scale graphs in a conventional way requires high computation and storage costs. Therefore, motivated by an urgent need in terms of efficiency and scalability in training GCN, sampling methods have been proposed and achieved a significant effect. In this paper, we categorize sampling methods based on the sampling mechanisms and provide a comprehensive survey of sampling methods for efficient training of GCN. To highlight the characteristics and differences of sampling methods, we present a detailed comparison within each category and further give an overall comparative analysis for the sampling methods in all categories. Finally, we discuss some challenges and future research directions of the sampling methods.

التعلم الآلي

GPT-GNN: Generative Pre-Training of Graph Neural Networks

255 - Ziniu Hu , Yuxiao Dong , Kuansan Wang 2020

Graph neural networks (GNNs) have been demonstrated to be powerful in modeling graph-structured data. However, training GNNs usually requires abundant task-specific labeled data, which is often arduously expensive to obtain. One effective way to redu ce the labeling effort is to pre-train an expressive GNN model on unlabeled data with self-supervision and then transfer the learned model to downstream tasks with only a few labels. In this paper, we present the GPT-GNN framework to initialize GNNs by generative pre-training. GPT-GNN introduces a self-supervised attributed graph generation task to pre-train a GNN so that it can capture the structural and semantic properties of the graph. We factorize the likelihood of the graph generation into two components: 1) Attribute Generation and 2) Edge Generation. By modeling both components, GPT-GNN captures the inherent dependency between node attributes and graph structure during the generative process. Comprehensive experiments on the billion-scale Open Academic Graph and Amazon recommendation data demonstrate that GPT-GNN significantly outperforms state-of-the-art GNN models without pre-training by up to 9.1% across various downstream tasks.

التعلم الآلي الشبكات الاجتماعية والمعلومات التعلم الالي