Distributed Representation of Subgraphs


Abstract in English

Network embeddings have become very popular in learning effective feature representations of networks. Motivated by the recent successes of embeddings in natural language processing, researchers have tried to find network embeddings in order to exploit machine learning algorithms for mining tasks like node classification and edge prediction. However, most of the work focuses on finding distributed representations of nodes, which are inherently ill-suited to tasks such as community detection which are intuitively dependent on subgraphs. Here, we propose sub2vec, an unsupervised scalable algorithm to learn feature representations of arbitrary subgraphs. We provide means to characterize similarties between subgraphs and provide theoretical analysis of sub2vec and demonstrate that it preserves the so-called local proximity. We also highlight the usability of sub2vec by leveraging it for network mining tasks, like community detection. We show that sub2vec gets significant gains over state-of-the-art methods and node-embedding methods. In particular, sub2vec offers an approach to generate a richer vocabulary of features of subgraphs to support representation and reasoning.

Download