No Arabic abstract
Graphs are naturally used to describe the structures of various real-world systems in biology, society, computer science etc., where subgraphs or motifs as basic blocks play an important role in function expression and information processing. However, existing research focuses on the basic statistics of certain motifs, largely ignoring the connection patterns among them. Recently, a subgraph network (SGN) model is proposed to study the potential structure among motifs, and it was found that the integration of SGN can enhance a series of graph classification methods. However, SGN model lacks diversity and is of quite high time complexity, making it difficult to widely apply in practice. In this paper, we introduce sampling strategies into SGN, and design a novel sampling subgraph network model, which is scale-controllable and of higher diversity. We also present a hierarchical feature fusion framework to integrate the structural features of diverse sampling SGNs, so as to improve the performance of graph classification. Extensive experiments demonstrate that, by comparing with the SGN model, our new model indeed has much lower time complexity (reduced by two orders of magnitude) and can better enhance a series of graph classification methods (doubling the performance enhancement).
Real-world networks exhibit prominent hierarchical and modular structures, with various subgraphs as building blocks. Most existing studies simply consider distinct subgraphs as motifs and use only their numbers to characterize the underlying network. Although such statistics can be used to describe a network model, or even to design some network algorithms, the role of subgraphs in such applications can be further explored so as to improve the results. In this paper, the concept of subgraph network (SGN) is introduced and then applied to network models, with algorithms designed for constructing the 1st-order and 2nd-order SGNs, which can be easily extended to build higher-order ones. Furthermore, these SGNs are used to expand the structural feature space of the underlying network, beneficial for network classification. Numerical experiments demonstrate that the network classification model based on the structural features of the original network together with the 1st-order and 2nd-order SGNs always performs the best as compared to the models based only on one or two of such networks. In other words, the structural features of SGNs can complement that of the original network for better network classification, regardless of the feature extraction method used, such as the handcrafted, network embedding and kernel-based methods.
A main challenge in mining network-based data is finding effective ways to represent or encode graph structures so that it can be efficiently exploited by machine learning algorithms. Several methods have focused in network representation at node/edge or substructure level. However, many real life challenges such as time-varying, multilayer, chemical compounds and brain networks involve analysis of a family of graphs instead of single one opening additional challenges in graph comparison and representation. Traditional approaches for learning representations relies on hand-crafting specialized heuristics to extract meaningful information about the graphs, e.g statistical properties, structural features, etc. as well as engineered graph distances to quantify dissimilarity between networks. In this work we provide an unsupervised approach to learn embedding representation for a collection of graphs so that it can be used in numerous graph mining tasks. By using an unsupervised neural network approach on input graphs, we aim to capture the underlying distribution of the data in order to discriminate between different class of networks. Our method is assessed empirically on synthetic and real life datasets and evaluated in three different tasks: graph clustering, visualization and classification. Results reveal that our method outperforms well known graph distances and graph-kernels in clustering and classification tasks, being highly efficient in runtime.
This paper proposes a novel model for predicting subgraphs in dynamic graphs, an extension of traditional link prediction. This proposed end-to-end model learns a mapping from the subgraph structures in the current snapshot to the subgraph structures in the next snapshot directly, i.e., edge existence among multiple nodes in the subgraph. A new mechanism named cross-attention with a twin-tower module is designed to integrate node attribute information and topology information collaboratively for learning subgraph evolution. We compare our model with several state-of-the-art methods for subgraph prediction and subgraph pattern prediction in multiple real-world homogeneous and heterogeneous dynamic graphs, respectively. Experimental results demonstrate that our model outperforms other models in these two tasks, with a gain increase from 5.02% to 10.88%.
Sampling is a widely used graph reduction technique to accelerate graph computations and simplify graph visualizations. By comprehensively analyzing the literature on graph sampling, we assume that existing algorithms cannot effectively preserve minority structures that are rare and small in a graph but are very important in graph analysis. In this work, we initially conduct a pilot user study to investigate representative minority structures that are most appealing to human viewers. We then perform an experimental study to evaluate the performance of existing graph sampling algorithms regarding minority structure preservation. Results confirm our assumption and suggest key points for designing a new graph sampling approach named mino-centric graph sampling (MCGS). In this approach, a triangle-based algorithm and a cut-point-based algorithm are proposed to efficiently identify minority structures. A set of importance assessment criteria are designed to guide the preservation of important minority structures. Three optimization objectives are introduced into a greedy strategy to balance the preservation between minority and majority structures and suppress the generation of new minority structures. A series of experiments and case studies are conducted to evaluate the effectiveness of the proposed MCGS.
To the best of our knowledge, this paper presents the first large-scale study that tests whether network categories (e.g., social networks vs. web graphs) are distinguishable from one another (using both categories of real-world networks and synthetic graphs). A classification accuracy of $94.2%$ was achieved using a random forest classifier with both real and synthetic networks. This work makes two important findings. First, real-world networks from various domains have distinct structural properties that allow us to predict with high accuracy the category of an arbitrary network. Second, classifying synthetic networks is trivial as our models can easily distinguish between synthetic graphs and the real-world networks they are supposed to model.