Discovering the set of closed frequent patterns is one of the fundamental problems in Data Mining. Recent Constraint Programming (CP) approaches for declarative itemset mining have proven their usefulness and flexibility. But the wide use of reified constraints in current CP approaches raises many difficulties to cope with high dimensional datasets. This paper proposes CLOSED PATTERN global constraint which does not require any reified constraints nor any extra variables to encode efficiently the Closed Frequent Pattern Mining (CFPM) constraint. CLOSED-PATTERN captures the particular semantics of the CFPM problem in order to ensure a polynomial pruning algorithm ensuring domain consistency. The computational properties of our constraint are analyzed and their practical effectiveness is experimentally evaluated.
The problem of discovering frequent itemsets including rare ones has received a great deal of attention. The mining process needs to be flexible enough to extract frequent and rare regularities at once. On the other hand, it has recently been shown that constraint programming is a flexible way to tackle data mining tasks. In this paper, we propose a constraint programming approach for mining itemsets with multiple minimum supports. Our approach provides the user with the possibility to express any kind of constraints on the minimum item supports. An experimental analysis shows the practical effectiveness of our approach compared to the state of the art.
Discovering significant itemsets is one of the fundamental problems in data mining. It has recently been shown that constraint programming is a flexible way to tackle data mining tasks. With a constraint programming approach, we can easily express and efficiently answer queries with users constraints on items. However, in many practical cases it is possible that queries also express users constraints on the dataset itself. For instance, asking for a particular itemset in a particular part of the dataset. This paper presents a general constraint programming model able to handle any kind of query on the items or the dataset for itemset mining.
Data mining is a widely used technology for various real-life applications of data analytics and is important to discover valuable association rules in transaction databases. Interesting itemset mining plays an important role in many real-life applications, such as market, e-commerce, finance, and medical treatment. To date, various data mining algorithms based on frequent patterns have been widely studied, but there are a few algorithms that focus on mining infrequent or rare patterns. In some cases, infrequent or rare itemsets and rare association rules also play an important role in real-life applications. In this paper, we introduce a novel fuzzy-based rare itemset mining algorithm called FRI-Miner, which discovers valuable and interesting fuzzy rare itemsets in a quantitative database by applying fuzzy theory with linguistic meaning. Additionally, FRI-Miner utilizes the fuzzy-list structure to store important information and applies several pruning strategies to reduce the search space. The experimental results show that the proposed FRI-Miner algorithm can discover fewer and more interesting itemsets by considering the quantitative value in reality. Moreover, it significantly outperforms state-of-the-art algorithms in terms of effectiveness (w.r.t. different types of derived patterns) and efficiency (w.r.t. running time and memory usage).
Recently, several large-scale RDF knowledge bases have been built and applied in many knowledge-based applications. To further increase the number of facts in RDF knowledge bases, logic rules can be used to predict new facts based on the existing ones. Therefore, how to automatically learn reliable rules from large-scale knowledge bases becomes increasingly important. In this paper, we propose a novel rule learning approach named RDF2Rules for RDF knowledge bases. RDF2Rules first mines frequent predicate cycles (FPCs), a kind of interesting frequent patterns in knowledge bases, and then generates rules from the mined FPCs. Because each FPC can produce multiple rules, and effective pruning strategy is used in the process of mining FPCs, RDF2Rules works very efficiently. Another advantage of RDF2Rules is that it uses the entity type information when generates and evaluates rules, which makes the learned rules more accurate. Experiments show that our approach outperforms the compared approach in terms of both efficiency and accuracy.
This paper introduces the combinatorial Boolean model (CBM), which is defined as the class of linear combinations of conjunctions of Boolean attributes. This paper addresses the issue of learning CBM from labeled data. CBM is of high knowledge interpretability but na{i}ve learning of it requires exponentially large computation time with respect to data dimension and sample size. To overcome this computational difficulty, we propose an algorithm GRAB (GRAfting for Boolean datasets), which efficiently learns CBM within the $L_1$-regularized loss minimization framework. The key idea of GRAB is to reduce the loss minimization problem to the weighted frequent itemset mining, in which frequent patterns are efficiently computable. We employ benchmark datasets to empirically demonstrate that GRAB is effective in terms of computational efficiency, prediction accuracy and knowledge discovery.