No Arabic abstract
Heterogeneous Information Network (HIN) has attracted much attention due to its wide applicability in a variety of data mining tasks, especially for tasks with multi-typed objects. A potentially large number of meta-paths can be extracted from the heterogeneous networks, providing abundant semantic knowledge. Though a variety of meta-paths can be defined, too many meta-paths are redundant. Reduction on the number of meta-paths can enhance the effectiveness since some redundant meta-paths provide interferential linkage to the task. Moreover, the reduced meta-paths can reflect the characteristic of the heterogeneous network. Previous endeavors try to reduce the number of meta-paths under the guidance of supervision information. Nevertheless, supervised information is expensive and may not always be available. In this paper, we propose a novel algorithm, SPMR (Semantic Preserving Meta-path Reduction), to reduce a set of pre-defined meta-paths in an unsupervised setting. The proposed method is able to evaluate a set of meta-paths to maximally preserve the semantics of original meta-paths after reduction. Experimental results show that SPMR can select a succinct subset of meta-paths which can achieve comparable or even better performance with fewer meta-paths.
Meta-graph is currently the most powerful tool for similarity search on heterogeneous information networks,where a meta-graph is a composition of meta-paths that captures the complex structural information. However, current relevance computing based on meta-graph only considers the complex structural information, but ignores its embedded meta-paths information. To address this problem, we proposeMEta-GrAph-based network embedding models, called MEGA and MEGA++, respectively. The MEGA model uses normalized relevance or similarity measures that are derived from a meta-graph and its embedded meta-paths between nodes simultaneously, and then leverages tensor decomposition method to perform node embedding. The MEGA++ further facilitates the use of coupled tensor-matrix decomposition method to obtain a joint embedding for nodes, which simultaneously considers the hidden relations of all meta information of a meta-graph.Extensive experiments on two real datasets demonstrate thatMEGA and MEGA++ are more effective than state-of-the-art approaches.
Real-world networks and knowledge graphs are usually heterogeneous networks. Representation learning on heterogeneous networks is not only a popular but a pragmatic research field. The main challenge comes from the heterogeneity -- the diverse types of nodes and edges. Besides, for a given node in a HIN, the significance of a neighborhood node depends not only on the structural distance but semantics. How to effectively capture both structural and semantic relations is another challenge. The current state-of-the-art methods are based on the algorithm of meta-path and therefore have a serious disadvantage -- the performance depends on the arbitrary choosing of meta-path(s). However, the selection of meta-path(s) is experience-based and time-consuming. In this work, we propose a novel meta-path-free representation learning on heterogeneous networks, namely Heterogeneous graph Convolutional Networks (HCN). The proposed method fuses the heterogeneity and develops a $k$-strata algorithm ($k$ is an integer) to capture the $k$-hop structural and semantic information in heterogeneous networks. To the best of our knowledge, this is the first attempt to break out of the confinement of meta-paths for representation learning on heterogeneous networks. We carry out extensive experiments on three real-world heterogeneous networks. The experimental results demonstrate that the proposed method significantly outperforms the current state-of-the-art methods in a variety of analytic tasks.
Most real-world data can be modeled as heterogeneous information networks (HINs) consisting of vertices of multiple types and their relationships. Search for similar vertices of the same type in large HINs, such as bibliographic networks and business-review networks, is a fundamental problem with broad applications. Although similarity search in HINs has been studied previously, most existing approaches neither explore rich semantic information embedded in the network structures nor take users preference as a guidance. In this paper, we re-examine similarity search in HINs and propose a novel embedding-based framework. It models vertices as low-dimensional vectors to explore network structure-embedded similarity. To accommodate user preferences at defining similarity semantics, our proposed framework, ESim, accepts user-defined meta-paths as guidance to learn vertex vectors in a user-preferred embedding space. Moreover, an efficient and parallel sampling-based optimization algorithm has been developed to learn embeddings in large-scale HINs. Extensive experiments on real-world large-scale HINs demonstrate a significant improvement on the effectiveness of ESim over several state-of-the-art algorithms as well as its scalability.
Networks found in the real-world are numerous and varied. A common type of network is the heterogeneous network, where the nodes (and edges) can be of different types. Accordingly, there have been efforts at learning representations of these heterogeneous networks in low-dimensional space. However, most of the existing heterogeneous network embedding methods suffer from the following two drawbacks: (1) The target space is usually Euclidean. Conversely, many recent works have shown that complex networks may have hyperbolic latent anatomy, which is non-Euclidean. (2) These methods usually rely on meta-paths, which require domain-specific prior knowledge for meta-path selection. Additionally, different down-streaming tasks on the same network might require different meta-paths in order to generate task-specific embeddings. In this paper, we propose a novel self-guided random walk method that does not require meta-path for embedding heterogeneous networks into hyperbolic space. We conduct thorough experiments for the tasks of network reconstruction and link prediction on two public datasets, showing that our model outperforms a variety of well-known baselines across all tasks.
A heterogeneous information network (HIN) has as vertices objects of different types and as edges the relations between objects, which are also of various types. We study the problem of classifying objects in HINs. Most existing methods perform poorly when given scarce labeled objects as training sets, and methods that improve classification accuracy under such scenarios are often computationally expensive. To address these problems, we propose ConCH, a graph neural network model. ConCH formulates the classification problem as a multi-task learning problem that combines semi-supervised learning with self-supervised learning to learn from both labeled and unlabeled data. ConCH employs meta-paths, which are sequences of object types that capture semantic relationships between objects. ConCH co-derives object embeddings and context embeddings via graph convolution. It also uses the attention mechanism to fuse such embeddings. We conduct extensive experiments to evaluate the performance of ConCH against other 15 classification methods. Our results show that ConCH is an effective and efficient method for HIN classification.