No Arabic abstract
The real-world networks often compose of different types of nodes and edges with rich semantics, widely known as heterogeneous information network (HIN). Heterogeneous network embedding aims to embed nodes into low-dimensional vectors which capture rich intrinsic information of heterogeneous networks. However, existing models either depend on manually designing meta-paths, ignore mutual effects between different semantics, or omit some aspects of information from global networks. To address these limitations, we propose a novel Graph-Aggregated Heterogeneous Network Embedding (GAHNE), which is designed to extract the semantics of HINs as comprehensively as possible to improve the results of downstream tasks based on graph convolutional neural networks. In GAHNE model, we develop several mechanisms that can aggregate semantic representations from different single-type sub-networks as well as fuse the global information into final embeddings. Extensive experiments on three real-world HIN datasets show that our proposed model consistently outperforms the existing state-of-the-art methods.
A large number of real-world graphs or networks are inherently heterogeneous, involving a diversity of node types and relation types. Heterogeneous graph embedding is to embed rich structural and semantic information of a heterogeneous graph into low-dimensional node representations. Existing models usually define multiple metapaths in a heterogeneous graph to capture the composite relations and guide neighbor selection. However, these models either omit node content features, discard intermediate nodes along the metapath, or only consider one metapath. To address these three limitations, we propose a new model named Metapath Aggregated Graph Neural Network (MAGNN) to boost the final performance. Specifically, MAGNN employs three major components, i.e., the node content transformation to encapsulate input node attributes, the intra-metapath aggregation to incorporate intermediate semantic nodes, and the inter-metapath aggregation to combine messages from multiple metapaths. Extensive experiments on three real-world heterogeneous graph datasets for node classification, node clustering, and link prediction show that MAGNN achieves more accurate prediction results than state-of-the-art baselines.
Recently, graph neural networks have been widely used for network embedding because of their prominent performance in pairwise relationship learning. In the real world, a more natural and common situation is the coexistence of pairwise relationships and complex non-pairwise relationships, which is, however, rarely studied. In light of this, we propose a graph neural network-based representation learning framework for heterogeneous hypergraphs, an extension of conventional graphs, which can well characterize multiple non-pairwise relations. Our framework first projects the heterogeneous hypergraph into a series of snapshots and then we take the Wavelet basis to perform localized hypergraph convolution. Since the Wavelet basis is usually much sparser than the Fourier basis, we develop an efficient polynomial approximation to the basis to replace the time-consuming Laplacian decomposition. Extensive evaluations have been conducted and the experimental results show the superiority of our method. In addition to the standard tasks of network embedding evaluation such as node classification, we also apply our method to the task of spammers detection and the superior performance of our framework shows that relationships beyond pairwise are also advantageous in the spammer detection.
Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification.
Community detection, aiming to group the graph nodes into clusters with dense inner-connection, is a fundamental graph mining task. Recently, it has been studied on the heterogeneous graph, which contains multiple types of nodes and edges, posing great challenges for modeling the high-order relationship between nodes. With the surge of graph embedding mechanism, it has also been adopted to community detection. A remarkable group of works use the meta-path to capture the high-order relationship between nodes and embed them into nodes embedding to facilitate community detection. However, defining meaningful meta-paths requires much domain knowledge, which largely limits their applications, especially on schema-rich heterogeneous graphs like knowledge graphs. To alleviate this issue, in this paper, we propose to exploit the context path to capture the high-order relationship between nodes, and build a Context Path-based Graph Neural Network (CP-GNN) model. It recursively embeds the high-order relationship between nodes into the node embedding with attention mechanisms to discriminate the importance of different relationships. By maximizing the expectation of the co-occurrence of nodes connected by context paths, the model can learn the nodes embeddings that both well preserve the high-order relationship between nodes and are helpful for community detection. Extensive experimental results on four real-world datasets show that CP-GNN outperforms the state-of-the-art community detection methods.
Sampling a network is an important prerequisite for unsupervised network embedding. Further, random walk has widely been used for sampling in previous studies. Since random walk based sampling tends to traverse adjacent neighbors, it may not be suitable for heterogeneous network because in heterogeneous networks two adjacent nodes often belong to different types. Therefore, this paper proposes a K-hop random walk based sampling approach which includes a node in the sample list only if it is separated by K hops from the source node. We exploit the samples generated using K-hop random walker for network embedding using skip-gram model (word2vec). Thereafter, the performance of network embedding is evaluated on co-authorship prediction task in heterogeneous DBLP network. We compare the efficacy of network embedding exploiting proposed sampling approach with recently proposed best performing network embedding models namely, Metapath2vec and Node2vec. It is evident that the proposed sampling approach yields better quality of embeddings and out-performs baselines in majority of the cases.