No Arabic abstract
Following the development of fuzzy logic theory by Lotfi Zadeh, its applications were investigated by researchers in different fields. Presenting and working with uncertain data is a complex problem. To solve for such a complex problem, the structure of relationships and operators dependent on such relationships must be repaired. The fuzzy database has integrity limitations including data dependencies. In this paper, first fuzzy multivalued dependency based semantic proximity and its problems are studied. To solve these problems, the semantic proximitys formula is modified, and fuzzy multivalued dependency based on the concept of extension of semantic proximity with alpha degree is defined in fuzzy relational database which includes Crisp, NULL and fuzzy values, and also inference rules for this dependency are defined, and their completeness is proved. Finally, we will show that fuzzy functional dependency based on this concept is a special case of fuzzy multivalued dependency in fuzzy relational database.
Data mining is a widely used technology for various real-life applications of data analytics and is important to discover valuable association rules in transaction databases. Interesting itemset mining plays an important role in many real-life applications, such as market, e-commerce, finance, and medical treatment. To date, various data mining algorithms based on frequent patterns have been widely studied, but there are a few algorithms that focus on mining infrequent or rare patterns. In some cases, infrequent or rare itemsets and rare association rules also play an important role in real-life applications. In this paper, we introduce a novel fuzzy-based rare itemset mining algorithm called FRI-Miner, which discovers valuable and interesting fuzzy rare itemsets in a quantitative database by applying fuzzy theory with linguistic meaning. Additionally, FRI-Miner utilizes the fuzzy-list structure to store important information and applies several pruning strategies to reduce the search space. The experimental results show that the proposed FRI-Miner algorithm can discover fewer and more interesting itemsets by considering the quantitative value in reality. Moreover, it significantly outperforms state-of-the-art algorithms in terms of effectiveness (w.r.t. different types of derived patterns) and efficiency (w.r.t. running time and memory usage).
Fuzzy systems have good modeling capabilities in several data science scenarios, and can provide human-explainable intelligence models with explainability and interpretability. In contrast to transaction data, which have been extensively studied, sequence data are more common in real-life applications. To obtain a human-explainable data intelligence model for decision making, in this study, we investigate explainable fuzzy-theoretic utility mining on multi-sequences. Meanwhile, a more normative formulation of the problem of fuzzy utility mining on sequences is formulated. By exploring fuzzy set theory for utility mining, we propose a novel method termed pattern growth fuzzy utility mining (PGFUM) for mining fuzzy high-utility sequences with linguistic meaning. In the case of sequence data, PGFUM reflects the fuzzy quantity and utility regions of sequences. To improve the efficiency and feasibility of PGFUM, we develop two compressed data structures with explainable fuzziness. Furthermore, one existing and two new upper bounds on the explainable fuzzy utility of candidates are adopted in three proposed pruning strategies to substantially reduce the search space and thus expedite the mining process. Finally, the proposed PGFUM algorithm is compared with PFUS, which is the only currently available method for the same task, through extensive experimental evaluation. It is demonstrated that PGFUM achieves not only human-explainable mining results that contain the original nature of revealable intelligibility, but also high efficiency in terms of runtime and memory cost.
In this paper we prove that Neutrosophic Set (NS) is an extension of Intuitionistic Fuzzy Set (IFS) no matter if the sum of single-valued neutrosophic components is < 1, or > 1, or = 1. For the case when the sum of components is 1 (as in IFS), after applying the neutrosophic aggregation operators one gets a different result from that of applying the intuitionistic fuzzy operators, since the intuitionistic fuzzy operators ignore the indeterminacy, while the neutrosophic aggregation operators take into consideration the indeterminacy at the same level as truth-membership and falsehood-nonmembership are taken. NS is also more flexible and effective because it handles, besides independent components, also partially independent and partially dependent components, while IFS cannot deal with these. Since there are many types of indeterminacies in our world, we can construct different approaches to various neutrosophic concepts. Also, Regret Theory, Grey System Theory, and Three-Ways Decision are particular cases of Neutrosophication and of Neutrosophic Probability. We extended for the first time the Three-Ways Decision to n-Ways Decision, and the Spherical Fuzzy Set to n-HyperSpherical Fuzzy Set and to n-HyperSpherical Neutrosophic Set.
Fuzzy similarity join is an important database operator widely used in practice. So far the research community has focused exclusively on optimizing fuzzy join textit{scalability}. However, practitioners today also struggle to optimize fuzzy-join textit{quality}, because they face a daunting space of parameters (e.g., distance-functions, distance-thresholds, tokenization-options, etc.), and often have to resort to a manual trial-and-error approach to program these parameters in order to optimize fuzzy-join quality. This key challenge of automatically generating high-quality fuzzy-join programs has received surprisingly little attention thus far. In this work, we study the problem of auto-program fuzzy-joins. Leveraging a geometric interpretation of distance-functions, we develop an unsupervised textsc{Auto-FuzzyJoin} framework that can infer suitable fuzzy-join programs on given input tables, without requiring explicit human input such as labeled training data. Using textsc{Auto-FuzzyJoin}, users only need to provide two input tables $L$ and $R$, and a desired precision target $tau$ (say 0.9). textsc{Auto-FuzzyJoin} leverages the fact that one of the input is a reference table to automatically program fuzzy-joins that meet the precision target $tau$ in expectation, while maximizing fuzzy-join recall (defined as the number of correctly joined records). Experiments on both existing benchmarks and a new benchmark with 50 fuzzy-join tasks created from Wikipedia data suggest that the proposed textsc{Auto-FuzzyJoin} significantly outperforms existing unsupervised approaches, and is surprisingly competitive even against supervised approaches (e.g., Magellan and DeepMatcher) when 50% of ground-truth labels are used as training data.
In semi-supervised fuzzy clustering, this paper extends the traditional pairwise constraint (i.e., must-link or cannot-link) to fuzzy pairwise constraint. The fuzzy pairwise constraint allows a supervisor to provide the grade of similarity or dissimilarity between the implicit fuzzy vectors of a pair of samples. This constraint can present more complicated relationship between the pair of samples and avoid eliminating the fuzzy characteristics. We propose a fuzzy discriminant clustering model (FDC) to fuse the fuzzy pairwise constraints. The nonconvex optimization problem in our FDC is solved by a modified expectation-maximization algorithm, involving to solve several indefinite quadratic programming problems (IQPPs). Further, a diagonal block coordinate decent (DBCD) algorithm is proposed for these IQPPs, whose stationary points are guaranteed, and the global solutions can be obtained under certain conditions. To suit for different applications, the FDC is extended into various metric spaces, e.g., the Reproducing Kernel Hilbert Space. Experimental results on several benchmark datasets and facial expression database demonstrate the outperformance of our FDC compared with some state-of-the-art clustering models.