New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Irreducible Frequent Patterns in Transactional Databases

69 0 0.0 ( 0 )

Download Cite

Added by Vyacheslav Gorshkov Mr

Publication date 2005

fields Informatics Engineering

and research's language is English

Authors Gennady P.Berman - Vyacheslavn N.Gorshkov (Los Alamos National Laboratory

Data Structures and Algorithms Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Irreducible frequent patters (IFPs) are introduced for transactional databases. An IFP is such a frequent pattern (FP),(x1,x2,...xn), the probability of which, P(x1,x2,...xn), cannot be represented as a product of the probabilities of two (or more) other FPs of the smaller lengths. We have developed an algorithm for searching IFPs in transactional databases. We argue that IFPs represent useful tools for characterizing the transactional databases and may have important applications to bio-systems including the immune systems and for improving vaccination strategies. The effectiveness of the IFPs approach has been illustrated in application to a classification problem.

rate research

Frequent Item-set Mining without Ubiquitous Items

59 - Ran M. Bittmann , Philippe Nemery , Xingtian Shi 2018

Frequent Item-set Mining (FIM), sometimes called Market Basket Analysis (MBA) or Association Rule Learning (ARL), are Machine Learning (ML) methods for creating rules from datasets of transactions of items. Most methods identify items likely to appear together in a transaction based on the support (i.e. a minimum number of relative co-occurrence of the items) for that hypothesis. Although this is a good indicator to measure the relevance of the assumption that these items are likely to appear together, the phenomenon of very frequent items, referred to as ubiquitous items, is not addressed in most algorithms. Ubiquitous items have the same entropy as infrequent items, and not contributing significantly to the knowledge. On the other hand, they have strong effect on the performance of the algorithms and sometimes preventing the convergence of the FIM algorithms and thus the provision of meaningful results. This paper discusses the phenomenon of ubiquitous items and demonstrates how ignoring these has a dramatic effect on the computation performances but with a low and controlled effect on the significance of the results.

Data Structures and Algorithms Databases

Polynesia: Enabling Effective Hybrid Transactional/Analytical Databases with Specialized Hardware/Software Co-Design

131 - Amirali Boroumand , Saugata Ghose , Geraldo F. Oliveira 2021

An exponential growth in data volume, combined with increasing demand for real-time analysis (i.e., using the most recent data), has resulted in the emergence of database systems that concurrently support transactions and data analytics. These hybrid transactional and analytical processing (HTAP) database systems can support real-time data analysis without the high costs of synchronizing across separate single-purpose databases. Unfortunately, for many applications that perform a high rate of data updates, state-of-the-art HTAP systems incur significant drops in transactional (up to 74.6%) and/or analytical (up to 49.8%) throughput compared to performing only transactions or only analytics in isolation, due to (1) data movement between the CPU and memory, (2) data update propagation, and (3) consistency costs. We propose Polynesia, a hardware-software co-designed system for in-memory HTAP databases. Polynesia (1) divides the HTAP system into transactional and analytical processing islands, (2) implements custom algorithms and hardware to reduce the costs of update propagation and consistency, and (3) exploits processing-in-memory for the analytical islands to alleviate data movement. Our evaluation shows that Polynesia outperforms three state-of-the-art HTAP systems, with average transactional/analytical throughput improvements of 1.70X/3.74X, and reduces energy consumption by 48% over the prior lowest-energy system.

Hardware Architecture Databases

Securing Databases from Probabilistic Inference

120 - Marco Guarnieri , Srdjan Marinovic , David Basin 2017

Databases can leak confidential information when users combine query results with probabilistic data dependencies and prior knowledge. Current research offers mechanisms that either handle a limited class of dependencies or lack tractable enforcement algorithms. We propose a foundation for Database Inference Control based on ProbLog, a probabilistic logic programming language. We leverage this foundation to develop Angerona, a provably secure enforcement mechanism that prevents information leakage in the presence of probabilistic dependencies. We then provide a tractable inference algorithm for a practically relevant fragment of ProbLog. We empirically evaluate Angeronas performance showing that it scales to relevant security-critical problems.

Cryptography and Security Databases

Semantically Enhanced Time Series Databases in IoT-Edge-Cloud Infrastructure

114 - Shuai Zhang , Wenxi Zeng , I-Ling Yen 2019

Many IoT systems are data intensive and are for the purpose of monitoring for fault detection and diagnosis of critical systems. A large volume of data steadily come out of a large number of sensors in the monitoring system. Thus, we need to consider how to store and manage these data. Existing time series databases (TSDBs) can be used for monitoring data storage, but they do not have good models for describing the data streams stored in the database. In this paper, we develop a semantic model for the specification of the monitoring data streams (time series data) in terms of which sensor generated the data stream, which metric of which entity the sensor is monitoring, what is the relation of the entity to other entities in the system, which measurement unit is used for the data stream, etc. We have also developed a tool suite, SE-TSDB, that can run on top of existing TSDBs to help establish semantic specifications for data streams and enable semantic-based data retrievals. With our semantic model for monitoring data and our SE-TSDB tool suite, users can retrieve non-existing data streams that can be automatically derived from the semantics. Users can also retrieve data streams without knowing where they are. Semantic based retrieval is especially important in a large-scale integrated IoT-Edge-Cloud system, because of its sheer quantity of data, its huge number of computing and IoT devices that may store the data, and the dynamics in data migration and evolution. With better data semantics, data streams can be more effectively tracked and flexibly retrieved to help with timely data analysis and control decision making anywhere and anytime.

Distributed Parallel and Cluster Computing Databases

On When and How to use SAT to Mine Frequent Itemsets

130 - Rui Henriques , In^es Lynce , Vasco Manquinho 2012

A new stream of research was born in the last decade with the goal of mining itemsets of interest using Constraint Programming (CP). This has promoted a natural way to combine complex constraints in a highly flexible manner. Although CP state-of-the-art solutions formulate the task using Boolean variables, the few attempts to adopt propositional Satisfiability (SAT) provided an unsatisfactory performance. This work deepens the study on when and how to use SAT for the frequent itemset mining (FIM) problem by defining different encodings with multiple task-driven enumeration options and search strategies. Although for the majority of the scenarios SAT-based solutions appear to be non-competitive with CP peers, results show a variety of interesting cases where SAT encodings are the best option.

Artificial Intelligence Databases Machine Learning

comments

Fetching comments

Helwan

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Irreducible Frequent Patterns in Transactional Databases

Ask ChatGPT about the research

No Arabic abstract

Read More