ترغب بنشر مسار تعليمي؟ اضغط هنا

Network Sampling Using K-hop Random Walks for Heterogeneous Network Embedding

160   0   0.0 ( 0 )
 نشر من قبل Akash Anil
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Sampling a network is an important prerequisite for unsupervised network embedding. Further, random walk has widely been used for sampling in previous studies. Since random walk based sampling tends to traverse adjacent neighbors, it may not be suitable for heterogeneous network because in heterogeneous networks two adjacent nodes often belong to different types. Therefore, this paper proposes a K-hop random walk based sampling approach which includes a node in the sample list only if it is separated by K hops from the source node. We exploit the samples generated using K-hop random walker for network embedding using skip-gram model (word2vec). Thereafter, the performance of network embedding is evaluated on co-authorship prediction task in heterogeneous DBLP network. We compare the efficacy of network embedding exploiting proposed sampling approach with recently proposed best performing network embedding models namely, Metapath2vec and Node2vec. It is evident that the proposed sampling approach yields better quality of embeddings and out-performs baselines in majority of the cases.

قيم البحث

اقرأ أيضاً

In recent years, network embedding methods have garnered increasing attention because of their effectiveness in various information retrieval tasks. The goal is to learn low-dimensional representations of vertexes in an information network and simult aneously capture and preserve the network structure. Critical to the performance of a network embedding method is how the edges/vertexes of the network is sampled for the learning process. Many existing methods adopt a uniform sampling method to reduce learning complexity, but when the network is non-uniform (i.e. a weighted network) such uniform sampling incurs information loss. The goal of this paper is to present a generalized vertex sampling framework that works seamlessly with most existing network embedding methods to support weighted instead of uniform vertex/edge sampling. For efficiency, we propose a delicate sequential vertex-to-context graph data structure, such that sampling a training pair for learning takes only constant time. For scalability and memory efficiency, we design the graph data structure in a way that keeps space consumption low without requiring additional space. In addition to implementing existing network embedding methods, the proposed framework can be used to implement extensions that feature high-order proximity modeling and weighted relation modeling. Experiments conducted on three datasets, including a commercial large-scale one, verify the effectiveness and efficiency of the proposed weighted network embedding methods on a variety of tasks, including word similarity search, multi-label classification, and item recommendation.
126 - Xiaohe Li , Lijie Wen , Chen Qian 2020
The real-world networks often compose of different types of nodes and edges with rich semantics, widely known as heterogeneous information network (HIN). Heterogeneous network embedding aims to embed nodes into low-dimensional vectors which capture r ich intrinsic information of heterogeneous networks. However, existing models either depend on manually designing meta-paths, ignore mutual effects between different semantics, or omit some aspects of information from global networks. To address these limitations, we propose a novel Graph-Aggregated Heterogeneous Network Embedding (GAHNE), which is designed to extract the semantics of HINs as comprehensively as possible to improve the results of downstream tasks based on graph convolutional neural networks. In GAHNE model, we develop several mechanisms that can aggregate semantic representations from different single-type sub-networks as well as fuse the global information into final embeddings. Extensive experiments on three real-world HIN datasets show that our proposed model consistently outperforms the existing state-of-the-art methods.
In recent time, applications of network embedding in mining real-world information network have been widely reported in the literature. Majority of the information networks are heterogeneous in nature. Meta-path is one of the popularly used approache s for generating embedding in heterogeneous networks. As meta-path guides the models towards a specific sub-structure, it tends to lose some hetero- geneous characteristics inherently present in the underlying network. In this paper, we systematically study the effects of different meta-paths using different state-of-art network embedding methods (Metapath2vec, Node2vec, and VERSE) over DBLP bibliographic network and evaluate the performance of embeddings using two applications (co-authorship prediction and authors research area classification tasks). From various experimental observations, it is evident that embedding using different meta-paths perform differently over different tasks. It shows that meta- paths are task-dependent and can not be generalized for different tasks. We further observe that embedding obtained after considering all the node and relation types in bibliographic network outperforms its meta- path based counterparts.
Network meta-analysis (NMA) is a central tool for evidence synthesis in clinical research. The results of an NMA depend critically on the quality of evidence being pooled. In assessing the validity of an NMA, it is therefore important to know the pro portion contributions of each direct treatment comparison to each network treatment effect. The construction of proportion contributions is based on the observation that each row of the hat matrix represents a so-called evidence flow network for each treatment comparison. However, the existing algorithm used to calculate these values is associated with ambiguity according to the selection of paths. In this work we present a novel analogy between NMA and random walks. We use this analogy to derive closed-form expressions for the proportion contributions. A random walk on a graph is a stochastic process that describes a succession of random hops between vertices which are connected by an edge. The weight of an edge relates to the probability that the walker moves along that edge. We use the graph representation of NMA to construct the transition matrix for a random walk on the network of evidence. We show that the net number of times a walker crosses each edge of the network is related to the evidence flow network. By then defining a random walk on the directed evidence flow network, we derive analytically the matrix of proportion contributions. The random-walk approach, in addition to being computationally more efficient, has none of the associated ambiguity of the existing algorithm.
101 - Yuanfu Lu , Chuan Shi , Linmei Hu 2019
Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا