Frequent Itemset Mining with Multiple Minimum Supports: a Constraint-based Approach

348 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Nadjib Lazaar Dr

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Mohamed-Bachir Belaid - Nadjib Lazaar

الذكاء الاصطناعي قواعد البيانات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The problem of discovering frequent itemsets including rare ones has received a great deal of attention. The mining process needs to be flexible enough to extract frequent and rare regularities at once. On the other hand, it has recently been shown that constraint programming is a flexible way to tackle data mining tasks. In this paper, we propose a constraint programming approach for mining itemsets with multiple minimum supports. Our approach provides the user with the possibility to express any kind of constraints on the minimum item supports. An experimental analysis shows the practical effectiveness of our approach compared to the state of the art.

قيم البحث

114 - Mehdi Maamar , Nadjib Lazaar , Samir Loudni 2016

Discovering the set of closed frequent patterns is one of the fundamental problems in Data Mining. Recent Constraint Programming (CP) approaches for declarative itemset mining have proven their usefulness and flexibility. But the wide use of reified constraints in current CP approaches raises many difficulties to cope with high dimensional datasets. This paper proposes CLOSED PATTERN global constraint which does not require any reified constraints nor any extra variables to encode efficiently the Closed Frequent Pattern Mining (CFPM) constraint. CLOSED-PATTERN captures the particular semantics of the CFPM problem in order to ensure a polynomial pruning algorithm ensuring domain consistency. The computational properties of our constraint are analyzed and their practical effectiveness is experimentally evaluated.

الذكاء الاصطناعي قواعد البيانات

Users Constraints in Itemset Mining

133 - Christian Bessiere , Nadjib Lazaar , Yahia Lebbah 2017

Discovering significant itemsets is one of the fundamental problems in data mining. It has recently been shown that constraint programming is a flexible way to tackle data mining tasks. With a constraint programming approach, we can easily express an d efficiently answer queries with users constraints on items. However, in many practical cases it is possible that queries also express users constraints on the dataset itself. For instance, asking for a particular itemset in a particular part of the dataset. This paper presents a general constraint programming model able to handle any kind of query on the items or the dataset for itemset mining.

الذكاء الاصطناعي قواعد البيانات

RDF2Rules: Learning Rules from RDF Knowledge Bases by Mining Frequent Predicate Cycles

118 - Zhichun Wang , Juanzi Li 2015

Recently, several large-scale RDF knowledge bases have been built and applied in many knowledge-based applications. To further increase the number of facts in RDF knowledge bases, logic rules can be used to predict new facts based on the existing one s. Therefore, how to automatically learn reliable rules from large-scale knowledge bases becomes increasingly important. In this paper, we propose a novel rule learning approach named RDF2Rules for RDF knowledge bases. RDF2Rules first mines frequent predicate cycles (FPCs), a kind of interesting frequent patterns in knowledge bases, and then generates rules from the mined FPCs. Because each FPC can produce multiple rules, and effective pruning strategy is used in the process of mining FPCs, RDF2Rules works very efficiently. Another advantage of RDF2Rules is that it uses the entity type information when generates and evaluates rules, which makes the learned rules more accurate. Experiments show that our approach outperforms the compared approach in terms of both efficiency and accuracy.

الذكاء الاصطناعي قواعد البيانات

Grafting for Combinatorial Boolean Model using Frequent Itemset Mining

160 - Taito Lee , Shin Matsushima , Kenji Yamanishi 2017

This paper introduces the combinatorial Boolean model (CBM), which is defined as the class of linear combinations of conjunctions of Boolean attributes. This paper addresses the issue of learning CBM from labeled data. CBM is of high knowledge interp retability but na{i}ve learning of it requires exponentially large computation time with respect to data dimension and sample size. To overcome this computational difficulty, we propose an algorithm GRAB (GRAfting for Boolean datasets), which efficiently learns CBM within the $L_1$-regularized loss minimization framework. The key idea of GRAB is to reduce the loss minimization problem to the weighted frequent itemset mining, in which frequent patterns are efficiently computable. We employ benchmark datasets to empirically demonstrate that GRAB is effective in terms of computational efficiency, prediction accuracy and knowledge discovery.

التعلم الالي التعلم الآلي

Frequent Item-set Mining without Ubiquitous Items

59 - Ran M. Bittmann , Philippe Nemery , Xingtian Shi 2018

Frequent Item-set Mining (FIM), sometimes called Market Basket Analysis (MBA) or Association Rule Learning (ARL), are Machine Learning (ML) methods for creating rules from datasets of transactions of items. Most methods identify items likely to appea r together in a transaction based on the support (i.e. a minimum number of relative co-occurrence of the items) for that hypothesis. Although this is a good indicator to measure the relevance of the assumption that these items are likely to appear together, the phenomenon of very frequent items, referred to as ubiquitous items, is not addressed in most algorithms. Ubiquitous items have the same entropy as infrequent items, and not contributing significantly to the knowledge. On the other hand, they have strong effect on the performance of the algorithms and sometimes preventing the convergence of the FIM algorithms and thus the provision of meaningful results. This paper discusses the phenomenon of ubiquitous items and demonstrates how ignoring these has a dramatic effect on the computation performances but with a low and controlled effect on the significance of the results.

بنى وهياكل البيانات والخوارزميات قواعد البيانات