ﻻ يوجد ملخص باللغة العربية
We present a technique for clustering categorical data by generating many dissimilarity matrices and averaging over them. We begin by demonstrating our technique on low dimensional categorical data and comparing it to several other techniques that have been proposed. Then we give conditions under which our method should yield good results in general. Our method extends to high dimensional categorical data of equal lengths by ensembling over many choices of explanatory variables. In this context we compare our method with two other methods. Finally, we extend our method to high dimensional categorical data vectors of unequal length by using alignment techniques to equalize the lengths. We give examples to show that our method continues to provide good results, in particular, better in the context of genome sequences than clusterings suggested by phylogenetic trees.
The clustering of data into physically meaningful subsets often requires assumptions regarding the number, size, or shape of the subgroups. Here, we present a new method, simultaneous coherent structure coloring (sCSC), which accomplishes the task of
Online retail, eCommerce, frequently falls victim to fraud conducted by malicious customers (fraudsters) who obtain goods or services through deception. Fraud coordinated by groups of professional fraudsters that place several fraudulent orders to ma
An extension of the latent class model is presented for clustering categorical data by relaxing the classical class conditional independence assumption of variables. This model consists in grouping the variables into inter-independent and intra-depen
We present the design and implementation of a custom discrete optimization technique for building rule lists over a categorical feature space. Our algorithm produces rule lists with optimal training performance, according to the regularized empirical
We develop new algorithmic methods with provable guarantees for feature selection in regard to categorical data clustering. While feature selection is one of the most common approaches to reduce dimensionality in practice, most of the known feature s