ﻻ يوجد ملخص باللغة العربية
Frequent Item-set Mining (FIM), sometimes called Market Basket Analysis (MBA) or Association Rule Learning (ARL), are Machine Learning (ML) methods for creating rules from datasets of transactions of items. Most methods identify items likely to appear together in a transaction based on the support (i.e. a minimum number of relative co-occurrence of the items) for that hypothesis. Although this is a good indicator to measure the relevance of the assumption that these items are likely to appear together, the phenomenon of very frequent items, referred to as ubiquitous items, is not addressed in most algorithms. Ubiquitous items have the same entropy as infrequent items, and not contributing significantly to the knowledge. On the other hand, they have strong effect on the performance of the algorithms and sometimes preventing the convergence of the FIM algorithms and thus the provision of meaningful results. This paper discusses the phenomenon of ubiquitous items and demonstrates how ignoring these has a dramatic effect on the computation performances but with a low and controlled effect on the significance of the results.
Irreducible frequent patters (IFPs) are introduced for transactional databases. An IFP is such a frequent pattern (FP),(x1,x2,...xn), the probability of which, P(x1,x2,...xn), cannot be represented as a product of the probabilities of two (or more) o
In this paper, we strengthen the competitive analysis results obtained for a fundamental online streaming problem, the Frequent Items Problem. Additionally, we contribute with a more detailed analysis of this problem, using alternative performance me
The problem of discovering frequent itemsets including rare ones has received a great deal of attention. The mining process needs to be flexible enough to extract frequent and rare regularities at once. On the other hand, it has recently been shown t
Recently, several large-scale RDF knowledge bases have been built and applied in many knowledge-based applications. To further increase the number of facts in RDF knowledge bases, logic rules can be used to predict new facts based on the existing one
FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivot