On When and How to use SAT to Mine Frequent Itemsets

337 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Rui Henriques

تاريخ النشر 2012

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Rui Henriques - In^es Lynce - Vasco Manquinho

الذكاء الاصطناعي قواعد البيانات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A new stream of research was born in the last decade with the goal of mining itemsets of interest using Constraint Programming (CP). This has promoted a natural way to combine complex constraints in a highly flexible manner. Although CP state-of-the-art solutions formulate the task using Boolean variables, the few attempts to adopt propositional Satisfiability (SAT) provided an unsatisfactory performance. This work deepens the study on when and how to use SAT for the frequent itemset mining (FIM) problem by defining different encodings with multiple task-driven enumeration options and search strategies. Although for the majority of the scenarios SAT-based solutions appear to be non-competitive with CP peers, results show a variety of interesting cases where SAT encodings are the best option.

قيم البحث

62 - Anneke Haga , Carsten Lutz , Leif Sabellek 2021

We introduce and study several notions of approximation for ontology-mediated queries based on the description logics ALC and ALCI. Our approximations are of two kinds: we may (1) replace the ontology with one formulated in a tractable ontology langu age such as ELI or certain TGDs and (2) replace the database with one from a tractable class such as the class of databases whose treewidth is bounded by a constant. We determine the computational complexity and the relative completeness of the resulting approximations. (Almost) all of them reduce the data complexity from coNP-complete to PTime, in some cases even to fixed-parameter tractable and to linear time. While approximations of kind (1) also reduce the combined complexity, this tends to not be the case for approximations of kind (2). In some cases, the combined complexity even increases.

الذكاء الاصطناعي قواعد البيانات

Estimation from Quantized Gaussian Measurements: When and How to Use Dither

146 - Joshua Rapp , Robin M. A. Dawson , Vivek K Goyal 2018

Subtractive dither is a powerful method for removing the signal dependence of quantization noise for coarsely-quantized signals. However, estimation from dithered measurements often naively applies the sample mean or midrange, even when the total noi se is not well described with a Gaussian or uniform distribution. We show that the generalized Gaussian distribution approximately describes subtractively-dithered, quantized samples of a Gaussian signal. Furthermore, a generalized Gaussian fit leads to simple estimators based on order statistics that match the performance of more complicated maximum likelihood estimators requiring iterative solvers. The order statistics-based estimators outperform both the sample mean and midrange for nontrivial sums of Gaussian and uniform noise. Additional analysis of the generalized Gaussian approximation yields rules of thumb for determining when and how to apply dither to quantized measurements. Specifically, we find subtractive dither to be beneficial when the ratio between the Gaussian standard deviation and quantization interval length is roughly less than 1/3. If that ratio is also greater than 0.822/$K^{0.930}$ for the number of measurements $K>20$, we present estimators more efficient than the midrange.

تطبيقات الإحصاء معالجة الإشارات

Justicia: A Stochastic SAT Approach to Formally Verify Fairness

111 - Bishwamittra Ghosh , Debabrota Basu , Kuldeep S. Meel 2020

As a technology ML is oblivious to societal good or bad, and thus, the field of fair machine learning has stepped up to propose multiple mathematical definitions, algorithms, and systems to ensure different notions of fairness in ML applications. Giv en the multitude of propositions, it has become imperative to formally verify the fairness metrics satisfied by different algorithms on different datasets. In this paper, we propose a textit{stochastic satisfiability} (SSAT) framework, Justicia, that formally verifies different fairness measures of supervised learning algorithms with respect to the underlying data distribution. We instantiate Justicia on multiple classification and bias mitigation algorithms, and datasets to verify different fairness metrics, such as disparate impact, statistical parity, and equalized odds. Justicia is scalable, accurate, and operates on non-Boolean and compound sensitive attributes unlike existing distribution-based verifiers, such as FairSquare and VeriFair. Being distribution-based by design, Justicia is more robust than the verifiers, such as AIF360, that operate on specific test samples. We also theoretically bound the finite-sample error of the verified fairness measure.

الذكاء الاصطناعي أجهزة الكمبيوتر والمجتمع التعلم الآلي

Frequent itemsets mining for database auto-administration

395 - Kamel Aouiche , Le Gruenwald 2008

With the wide development of databases in general and data warehouses in particular, it is important to reduce the tasks that a database administrator must perform manually. The aim of auto-administrative systems is to administrate and adapt themselv es automatically without loss (or even with a gain) in performance. The idea of using data mining techniques to extract useful knowledge for administration from the data themselves has existed for some years. However, little research has been achieved. This idea nevertheless remains a very promising approach, notably in the field of data warehousing, where queries are very heterogeneous and cannot be interpreted easily. The aim of this study is to search for a way of extracting useful knowledge from stored data themselves to automatically apply performance optimization techniques, and more particularly indexing techniques. We have designed a tool that extracts frequent itemsets from a given workload to compute an index configuration that helps optimizing data access time. The experiments we performed showed that the index configurations generated by our tool allowed performance gains of 15% to 25% on a test database and a test data warehouse.

قواعد البيانات

Frequent Itemset Mining with Multiple Minimum Supports: a Constraint-based Approach

347 - Mohamed-Bachir Belaid , Nadjib Lazaar 2021

The problem of discovering frequent itemsets including rare ones has received a great deal of attention. The mining process needs to be flexible enough to extract frequent and rare regularities at once. On the other hand, it has recently been shown t hat constraint programming is a flexible way to tackle data mining tasks. In this paper, we propose a constraint programming approach for mining itemsets with multiple minimum supports. Our approach provides the user with the possibility to express any kind of constraints on the minimum item supports. An experimental analysis shows the practical effectiveness of our approach compared to the state of the art.

الذكاء الاصطناعي قواعد البيانات