Explainable Fuzzy Utility Mining on Sequences

141 0 0.0 ( 0 )

Download Cite

Added by Wensheng Gan

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Wensheng Gan - Zilin Du - Weiping Ding

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Fuzzy systems have good modeling capabilities in several data science scenarios, and can provide human-explainable intelligence models with explainability and interpretability. In contrast to transaction data, which have been extensively studied, sequence data are more common in real-life applications. To obtain a human-explainable data intelligence model for decision making, in this study, we investigate explainable fuzzy-theoretic utility mining on multi-sequences. Meanwhile, a more normative formulation of the problem of fuzzy utility mining on sequences is formulated. By exploring fuzzy set theory for utility mining, we propose a novel method termed pattern growth fuzzy utility mining (PGFUM) for mining fuzzy high-utility sequences with linguistic meaning. In the case of sequence data, PGFUM reflects the fuzzy quantity and utility regions of sequences. To improve the efficiency and feasibility of PGFUM, we develop two compressed data structures with explainable fuzziness. Furthermore, one existing and two new upper bounds on the explainable fuzzy utility of candidates are adopted in three proposed pruning strategies to substantially reduce the search space and thus expedite the mining process. Finally, the proposed PGFUM algorithm is compared with PFUS, which is the only currently available method for the same task, through extensive experimental evaluation. It is demonstrated that PGFUM achieves not only human-explainable mining results that contain the original nature of revealable intelligibility, but also high efficiency in terms of runtime and memory cost.

rate research

On-shelf Utility Mining of Sequence Data

107 - Chunkai Zhang , Zilin Du , Yuting Yang 2020

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this paper, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS+, to extract on-shelf high-utility sequential patterns. For further efficiency, we also designed several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility (TPEU) and time reduced sequence utility (TRSU). In addition, two novel data structures were developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS+ has wider real-life applications owing to its high efficiency.

Databases

FRI-Miner: Fuzzy Rare Itemset Mining

154 - Yanling Cui , Wensheng Gan , Hong Lin 2021

Data mining is a widely used technology for various real-life applications of data analytics and is important to discover valuable association rules in transaction databases. Interesting itemset mining plays an important role in many real-life applications, such as market, e-commerce, finance, and medical treatment. To date, various data mining algorithms based on frequent patterns have been widely studied, but there are a few algorithms that focus on mining infrequent or rare patterns. In some cases, infrequent or rare itemsets and rare association rules also play an important role in real-life applications. In this paper, we introduce a novel fuzzy-based rare itemset mining algorithm called FRI-Miner, which discovers valuable and interesting fuzzy rare itemsets in a quantitative database by applying fuzzy theory with linguistic meaning. Additionally, FRI-Miner utilizes the fuzzy-list structure to store important information and applies several pruning strategies to reduce the search space. The experimental results show that the proposed FRI-Miner algorithm can discover fewer and more interesting itemsets by considering the quantitative value in reality. Moreover, it significantly outperforms state-of-the-art algorithms in terms of effectiveness (w.r.t. different types of derived patterns) and efficiency (w.r.t. running time and memory usage).

Databases

TKUS: Mining Top-K High-Utility Sequential Patterns

132 - Chunkai Zhang , Zilin Du , Wensheng Gan 2020

High-utility sequential pattern mining (HUSPM) has recently emerged as a focus of intense research interest. The main task of HUSPM is to find all subsequences, within a quantitative sequential database, that have high utility with respect to a user-defined minimum utility threshold. However, it is difficult to specify the minimum utility threshold, especially when database features, which are invisible in most cases, are not understood. To handle this problem, top-k HUSPM was proposed. Up to now, only very preliminary work has been conducted to capture top-k HUSPs, and existing strategies require improvement in terms of running time, memory consumption, unpromising candidate filtering, and scalability. Moreover, no systematic problem statement has been defined. In this paper, we formulate the problem of top-k HUSPM and propose a novel algorithm called TKUS. To improve efficiency, TKUS adopts a projection and local search mechanism and employs several schemes, including the Sequence Utility Raising, Terminate Descendants Early, and Eliminate Unpromising Items strategies, which allow it to greatly reduce the search space. Finally, experimental results demonstrate that TKUS can achieve sufficiently good top-k HUSPM performance compared to state-of-the-art algorithm TKHUS-Span.

Databases

Universally Utility-Maximizing Privacy Mechanisms

701 - Arpita Ghosh , Tim Roughgarden , Mukund Sundararajan 2009

A mechanism for releasing information about a statistical database with sensitive data must resolve a trade-off between utility and privacy. Privacy can be rigorously quantified using the framework of {em differential privacy}, which requires that a mechanisms output distribution is nearly the same whether or not a given database row is included or excluded. The goal of this paper is strong and general utility guarantees, subject to differential privacy. We pursue mechanisms that guarantee near-optimal utility to every potential user, independent of its side information (modeled as a prior distribution over query results) and preferences (modeled via a loss function). Our main result is: for each fixed count query and differential privacy level, there is a {em geometric mechanism} $M^*$ -- a discrete variant of the simple and well-studied Laplace mechanism -- that is {em simultaneously expected loss-minimizing} for every possible user, subject to the differential privacy constraint. This is an extremely strong utility guarantee: {em every} potential user $u$, no matter what its side information and preferences, derives as much utility from $M^*$ as from interacting with a differentially private mechanism $M_u$ that is optimally tailored to $u$.

Databases Computer Science and Game Theory

An Extension of Semantic Proximity for Fuzzy Multivalued Dependencies in Fuzzy Relational Database

359 - Arezoo Rajaei , Ahmad Baraani Dastjerdi , Nasser Ghasem Aghaee 2011

Following the development of fuzzy logic theory by Lotfi Zadeh, its applications were investigated by researchers in different fields. Presenting and working with uncertain data is a complex problem. To solve for such a complex problem, the structure of relationships and operators dependent on such relationships must be repaired. The fuzzy database has integrity limitations including data dependencies. In this paper, first fuzzy multivalued dependency based semantic proximity and its problems are studied. To solve these problems, the semantic proximitys formula is modified, and fuzzy multivalued dependency based on the concept of extension of semantic proximity with alpha degree is defined in fuzzy relational database which includes Crisp, NULL and fuzzy values, and also inference rules for this dependency are defined, and their completeness is proved. Finally, we will show that fuzzy functional dependency based on this concept is a special case of fuzzy multivalued dependency in fuzzy relational database.

Databases Information Retrieval