ﻻ يوجد ملخص باللغة العربية
Leveraging domain knowledge including fingerprints and functional groups in molecular representation learning is crucial for chemical property prediction and drug discovery. When modeling the relation between graph structure and molecular properties implicitly, existing works can hardly capture structural or property changes and complex structure, with much smaller atom vocabulary and highly frequent atoms. In this paper, we propose the Contrastive Knowledge-aware GNN (CKGNN) for self-supervised molecular representation learning to fuse domain knowledge into molecular graph representation. We explicitly encode domain knowledge via knowledge-aware molecular encoder under the contrastive learning framework, ensuring that the generated molecular embeddings equipped with chemical domain knowledge to distinguish molecules with similar chemical formula but dissimilar functions. Extensive experiments on 8 public datasets demonstrate the effectiveness of our model with a 6% absolute improvement on average against strong competitors. Ablation study and further investigation also verify the best of both worlds: incorporation of chemical domain knowledge into self-supervised learning.
Recently, heterogeneous Graph Neural Networks (GNNs) have become a de facto model for analyzing HGs, while most of them rely on a relative large number of labeled data. In this work, we investigate Contrastive Learning (CL), a key component in self-s
Molecular graph generation is a fundamental but challenging task in various applications such as drug discovery and material science, which requires generating valid molecules with desired properties. Auto-regressive models, which usually construct g
Molecular machine learning bears promise for efficient molecule property prediction and drug discovery. However, due to the limited labeled data and the giant chemical space, machine learning models trained via supervised learning perform poorly in g
Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. But in practice, precise graph annotations are generally very expensive and time-consuming. To address this issue, graph cont
Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100). To handle these challenges, we propose a self-supervised approach based on contrastive learni