No Arabic abstract
The detection of triadic subgraph motifs is a common methodology in complex-networks research. The procedure usually applied in order to detect motifs evaluates whether a certain subgraph pattern is overrepresented in a network as a whole. However, motifs do not necessarily appear frequently in every region of a graph. For this reason, we recently introduced the framework of Node-Specific Pattern Mining (NoSPaM). This work is a manual for an implementation of NoSPaM which can be downloaded from www.mwinkler.eu.
The mining of graphs in terms of their local substructure is a well-established methodology to analyze networks. It was hypothesized that motifs - subgraph patterns which appear significantly more often than expected at random - play a key role for the ability of a system to perform its task. Yet the framework commonly used for motif-detection averages over the local environments of all nodes. Therefore, it remains unclear whether motifs are overrepresented in the whole system or only in certain regions. In this contribution, we overcome this limitation by mining node-specific triad patterns. For every vertex, the abundance of each triad pattern is considered only in triads it participates in. We investigate systems of various fields and find that motifs are distributed highly heterogeneously. In particular we focus on the feed-forward loop motif which has been alleged to play a key role in biological networks.
Networks are used as highly expressive tools in different disciplines. In recent years, the analysis and mining of temporal networks have attracted substantial attention. Frequent pattern mining is considered an essential task in the network science literature. In addition to the numerous applications, the investigation of frequent pattern mining in networks directly impacts other analytical approaches, such as clustering, quasi-clique and clique mining, and link prediction. In nearly all the algorithms proposed for frequent pattern mining in temporal networks, the networks are represented as sequences of static networks. Then, the inter- or intra-network patterns are mined. This type of representation imposes a computation-expressiveness trade-off to the mining problem. In this paper, we propose a novel representation that can preserve the temporal aspects of the network losslessly. Then, we introduce the concept of constrained interval graphs (CIGs). Next, we develop a series of algorithms for mining the complete set of frequent temporal patterns in a temporal network data set. We also consider four different definitions of isomorphism to allow noise tolerance in temporal data collection. Implementing the algorithm for three real-world data sets proves the practicality of the proposed algorithm and its capability to discover unknown patterns in various settings.
In this paper we describe a novel framework and algorithms for discovering image patch patterns from a large corpus of weakly supervised image-caption pairs generated from news events. Current pattern mining techniques attempt to find patterns that are representative and discriminative, we stipulate that our discovered patterns must also be recognizable by humans and preferably with meaningful names. We propose a new multimodal pattern mining approach that leverages the descriptive captions often accompanying news images to learn semantically meaningful image patch patterns. The mutltimodal patterns are then named using words mined from the associated image captions for each pattern. A novel evaluation framework is provided that demonstrates our patterns are 26.2% more semantically meaningful than those discovered by the state of the art vision only pipeline, and that we can provide tags for the discovered images patches with 54.5% accuracy with no direct supervision. Our methods also discover named patterns beyond those covered by the existing image datasets like ImageNet. To the best of our knowledge this is the first algorithm developed to automatically mine image patch patterns that have strong semantic meaning specific to high-level news events, and then evaluate these patterns based on that criteria.
Segregation is the separation of social groups in the physical or in the online world. Segregation discovery consists of finding contexts of segregation. In the modern digital society, discovering segregation is challenging, due to the large amount and the variety of social data. We present a tool in support of segregation discovery from relational and graph data. The SCube system builds on attributed graph clustering and frequent itemset mining. It offers to the analyst a multi-dimensional segregation data cube for exploratory data analysis. The demonstration first guides the audience through the relevant social science concepts. Then, it focuses on scenarios around case studies of gender occupational segregation. Two real and large datasets about the boards of directors of Italian and Estonian companies will be explored in search of segregation contexts. The architecture of the SCube system and its computational efficiency challenges and solutions are discussed.
In this paper we study predictive pattern mining problems where the goal is to construct a predictive model based on a subset of predictive patterns in the database. Our main contribution is to introduce a novel method called safe pattern pruning (SPP) for a class of predictive pattern mining problems. The SPP method allows us to efficiently find a superset of all the predictive patterns in the database that are needed for the optimal predictive model. The advantage of the SPP method over existing boosting-type method is that the former can find the superset by a single search over the database, while the latter requires multiple searches. The SPP method is inspired by recent development of safe feature screening. In order to extend the idea of safe feature screening into predictive pattern mining, we derive a novel pruning rule called safe pattern pruning (SPP) rule that can be used for searching over the tree defined among patterns in the database. The SPP rule has a property that, if a node corresponding to a pattern in the database is pruned out by the SPP rule, then it is guaranteed that all the patterns corresponding to its descendant nodes are never needed for the optimal predictive model. We apply the SPP method to graph mining and item-set mining problems, and demonstrate its computational advantage.