Motif-Driven Contrastive Learning of Graph Representations

54 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shichang Zhang

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shichang Zhang - Ziniu Hu - Arjun Subramonian

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Pre-training Graph Neural Networks (GNN) via self-supervised contrastive learning has recently drawn lots of attention. However, most existing works focus on node-level contrastive learning, which cannot capture global graph structure. The key challenge to conducting subgraph-level contrastive learning is to sample informative subgraphs that are semantically meaningful. To solve it, we propose to learn graph motifs, which are frequently-occurring subgraph patterns (e.g. functional groups of molecules), for better subgraph sampling. Our framework MotIf-driven Contrastive leaRning Of Graph representations (MICRO-Graph) can: 1) use GNNs to extract motifs from large graph datasets; 2) leverage learned motifs to sample informative subgraphs for contrastive learning of GNN. We formulate motif learning as a differentiable clustering problem, and adopt EM-clustering to group similar and significant subgraphs into several motifs. Guided by these learned motifs, a sampler is trained to generate more informative subgraphs, and these subgraphs are used to train GNNs through graph-to-subgraph contrastive learning. By pre-training on the ogbg-molhiv dataset with MICRO-Graph, the pre-trained GNN achieves 2.04% ROC-AUC average performance enhancement on various downstream benchmark datasets, which is significantly higher than other state-of-the-art self-supervised learning baselines.

قيم البحث

81 - Yuyang Wang , Jianren Wang , Zhonglin Cao 2021

Molecular machine learning bears promise for efficient molecule property prediction and drug discovery. However, due to the limited labeled data and the giant chemical space, machine learning models trained via supervised learning perform poorly in g eneralization. This greatly limits the applications of machine learning methods for molecular design and discovery. In this work, we present MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (GNNs), a self-supervised learning framework for large unlabeled molecule datasets. Specifically, we first build a molecular graph, where each node represents an atom and each edge represents a chemical bond. A GNN is then used to encode the molecule graph. We propose three novel molecule graph augmentations: atom masking, bond deletion, and subgraph removal. A contrastive estimator is utilized to maximize the agreement of different graph augmentations from the same molecule. Experiments show that molecule representations learned by MolCLR can be transferred to multiple downstream molecular property prediction tasks. Our method thus achieves state-of-the-art performance on many challenging datasets. We also prove the efficiency of our proposed molecule graph augmentations on supervised molecular classification tasks.

التعلم الآلي الفيزياء الكيميائية

Prototypical Graph Contrastive Learning

91 - Shuai Lin , Pan Zhou , Zi-Yuan Hu 2021

Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. But in practice, precise graph annotations are generally very expensive and time-consuming. To address this issue, graph cont rastive learning constructs instance discrimination task which pulls together positive pairs (augmentation pairs of the same graph) and pushes away negative pairs (augmentation pairs of different graphs) for unsupervised representation learning. However, since for a query, its negatives are uniformly sampled from all graphs, existing methods suffer from the critical sampling bias issue, i.e., the negatives likely having the same semantic structure with the query, leading to performance degradation. To mitigate this sampling bias issue, in this paper, we propose a Prototypical Graph Contrastive Learning (PGCL) approach. Specifically, PGCL models the underlying semantic structure of the graph data via clustering semantically similar graphs into the same group, and simultaneously encourages the clustering consistency for different augmentations of the same graph. Then given a query, it performs negative sampling via drawing the graphs from those clusters that differ from the cluster of query, which ensures the semantic difference between query and its negative samples. Moreover, for a query, PGCL further reweights its negative samples based on the distance between their prototypes (cluster centroids) and the query prototype such that those negatives having moderate prototype distance enjoy relatively large weights. This reweighting strategy is proved to be more effective than uniform sampling. Experimental results on various graph benchmarks testify the advantages of our PGCL over state-of-the-art methods.

التعلم الآلي الذكاء الاصطناعي

Distance-wise Graph Contrastive Learning

142 - Deli Chen , Yanyai Lin , Lei Li 2020

Contrastive learning (CL) has proven highly effective in graph-based semi-supervised learning (SSL), since it can efficiently supplement the limited task information from the annotated nodes in graph. However, existing graph CL (GCL) studies ignore t he uneven distribution of task information across graph caused by the graph topology and the selection of annotated nodes. They apply CL to the whole graph evenly, which results in an incongruous combination of CL and graph learning. To address this issue, we propose to apply CL in the graph learning adaptively by taking the received task information of each node into consideration. Firstly, we introduce Group PageRank to measure the node information gain from graph and find that CL mainly works for nodes that are topologically far away from the labeled nodes. We then propose our Distance-wise Graph Contrastive Learning (DwGCL) method from two views:(1) From the global view of the task information distribution across the graph, we enhance the CL effect on nodes that are topologically far away from labeled nodes; (2) From the personal view of each nodes received information, we measure the relative distance between nodes and then we adapt the sampling strategy of GCL accordingly. Extensive experiments on five benchmark graph datasets show that DwGCL can bring a clear improvement over previous GCL methods. Our analysis on eight graph neural network with various types of architecture and three different annotation settings further demonstrates the generalizability of DwGCL.

التعلم الآلي

Multi-Level Graph Contrastive Learning

277 - Pengpeng Shao , Tong Liu , Dawei Zhang 2021

Graph representation learning has attracted a surge of interest recently, whose target at learning discriminant embedding for each node in the graph. Most of these representation methods focus on supervised learning and heavily depend on label inform ation. However, annotating graphs are expensive to obtain in the real world, especially in specialized domains (i.e. biology), as it needs the annotator to have the domain knowledge to label the graph. To approach this problem, self-supervised learning provides a feasible solution for graph representation learning. In this paper, we propose a Multi-Level Graph Contrastive Learning (MLGCL) framework for learning robust representation of graph data by contrasting space views of graphs. Specifically, we introduce a novel contrastive view - topological and feature space views. The original graph is first-order approximation structure and contains uncertainty or error, while the $k$NN graph generated by encoding features preserves high-order proximity. Thus $k$NN graph generated by encoding features not only provide a complementary view, but is more suitable to GNN encoder to extract discriminant representation. Furthermore, we develop a multi-level contrastive mode to preserve the local similarity and semantic similarity of graph-structured data simultaneously. Extensive experiments indicate MLGCL achieves promising results compared with the existing state-of-the-art graph representation learning methods on seven datasets.

التعلم الآلي الذكاء الاصطناعي

Spatio-Temporal Graph Contrastive Learning

329 - Xu Liu , Yuxuan Liang , Yu Zheng 2021

Deep learning models are modern tools for spatio-temporal graph (STG) forecasting. Despite their effectiveness, they require large-scale datasets to achieve better performance and are vulnerable to noise perturbation. To alleviate these limitations, an intuitive idea is to use the popular data augmentation and contrastive learning techniques. However, existing graph contrastive learning methods cannot be directly applied to STG forecasting due to three reasons. First, we empirically discover that the forecasting task is unable to benefit from the pretrained representations derived from contrastive learning. Second, data augmentations that are used for defeating noise are less explored for STG data. Third, the semantic similarity of samples has been overlooked. In this paper, we propose a Spatio-Temporal Graph Contrastive Learning framework (STGCL) to tackle these issues. Specifically, we improve the performance by integrating the forecasting loss with an auxiliary contrastive loss rather than using a pretrained paradigm. We elaborate on four types of data augmentations, which disturb data in terms of graph structure, time domain, and frequency domain. We also extend the classic contrastive loss through a rule-based strategy that filters out the most semantically similar negatives. Our framework is evaluated across three real-world datasets and four state-of-the-art models. The consistent improvements demonstrate that STGCL can be used as an off-the-shelf plug-in for existing deep models.

التعلم الآلي الذكاء الاصطناعي