ﻻ يوجد ملخص باللغة العربية
Discriminative patterns are association patterns that occur with disproportionate frequency in some classes versus others, and have been studied under names such as emerging patterns and contrast sets. Such patterns have demonstrated considerable value for classification and subgroup discovery, but a detailed understanding of the types of interactions among items in a discriminative pattern is lacking. To address this issue, we propose to categorize discriminative patterns according to four types of item interaction: (i) driver-passenger, (ii) coherent, (iii) independent additive and (iv) synergistic beyond independent additive. Either of the last three is of practical importance, with the latter two representing a gain in the discriminative power of a pattern over its subsets. Synergistic patterns are most restrictive, but perhaps the most interesting since they capture a cooperative effect. For domains such as genetic research, differentiating among these types of patterns is critical since each yields very different biological interpretations. For general domains, the characterization provides a novel view of the nature of the discriminative patterns in a dataset, which yields insights beyond those provided by current approaches that focus mostly on pattern-based classification and subgroup discovery. This paper presents a comprehensive discussion that defines these four pattern types and investigates their properties and their relationship to one another. In addition, these ideas are explored for a variety of datasets (ten UCI datasets, one gene expression dataset and two genetic-variation datasets). The results demonstrate the existence, characteristics and statistical significance of the different types of patterns. They also illustrate how pattern characterization can provide novel insights into discriminative pattern mining and the discriminative structure of different datasets.
We study the hardness of Approximate Query Processing (AQP) of various types of queries involving joins over multiple tables of possibly different sizes. In the case where the query result is a single value (e.g., COUNT, SUM, and COUNT(DISTINCT)), we
The Fisher-Rao metric from Information Geometry is related to phase transition phenomena in classical statistical mechanics. Several studies propose to extend the use of Information Geometry to study more general phase transitions in complex systems.
High-utility sequential pattern mining (HUSPM) has recently emerged as a focus of intense research interest. The main task of HUSPM is to find all subsequences, within a quantitative sequential database, that have high utility with respect to a user-
As large graph processing emerges, we observe a costly fork-processing pattern (FPP) that is common in many graph algorithms. The unique feature of the FPP is that it launches many independent queries from different source vertices on the same graph.
It is widely known that there is a lot of useful information hidden in big data, leading to a new saying that data is money. Thus, it is prevalent for individuals to mine crucial information for utilization in many real-world applications. In the pas