A novel initialisation based on hospital-resident assignment for the k-modes algorithm

74 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Henry Wilde

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Henry Wilde - Vincent Knight - Jonathan Gillard

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper presents a new way of selecting an initial solution for the k-modes algorithm that allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not. The method, which utilises the Hospital-Resident Assignment Problem to find the set of initial cluster centroids, is compared with the current initialisations on both benchmark datasets and a body of newly generated artificial datasets. Based on this analysis, the proposed method is shown to outperform the other initialisations in the majority of cases, especially when the number of clusters is optimised. In addition, we find that our method outperforms the leading established method specifically for low-density data.

قيم البحث

492 - Qiang Li , Yan He , Jing-ping Jiang 2009

Enormous successes have been made by quantum algorithms during the last decade. In this paper, we combine the quantum game with the problem of data clustering, and then develop a quantum-game-based clustering algorithm, in which data points in a data set are considered as players who can make decisions and implement quantum strategies in quantum games. After each round of a quantum game, each players expected payoff is calculated. Later, he uses a link-removing-and-rewiring (LRR) function to change his neighbors and adjust the strength of links connecting to them in order to maximize his payoff. Further, algorithms are discussed and analyzed in two cases of strategies, two payoff matrixes and two LRR functions. Consequently, the simulation results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the clustering algorithms have fast rates of convergence. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

A Novel Confidence-Based Algorithm for Structured Bandits

82 - Andrea Tirinzoni , Alessandro Lazaric , Marcello Restelli 2020

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem and rapidly discard all sub-optimal arms. In particular, unlike standard bandit algorithms with no structure, we show that the number of times a suboptimal arm is selected may actually be reduced thanks to the information collected by pulling other arms. Furthermore, we show that, in some structures, the regret of an anytime extension of our algorithm is uniformly bounded over time. For these constant-regret structures, we also derive a matching lower bound. Finally, we demonstrate numerically that our approach better exploits certain structures than existing methods.

التعلم الآلي التعلم الالي

A Novel Clustering Algorithm Based on Quantum Random Walk

375 - Qiang Li , Yan He , Jing-ping Jiang 2008

The enormous successes have been made by quantum algorithms during the last decade. In this paper, we combine the quantum random walk (QRW) with the problem of data clustering, and develop two clustering algorithms based on the one dimensional QRW. T hen, the probability distributions on the positions induced by QRW in these algorithms are investigated, which also indicates the possibility of obtaining better results. Consequently, the experimental results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the clustering algorithms are of fast rates of convergence. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.

التعلم الآلي

A Novel Clustering Algorithm Based on a Modified Model of Random Walk

369 - Qiang Li , Yan He , Jing-ping Jiang 2008

We introduce a modified model of random walk, and then develop two novel clustering algorithms based on it. In the algorithms, each data point in a dataset is considered as a particle which can move at random in space according to the preset rules in the modified model. Further, this data point may be also viewed as a local control subsystem, in which the controller adjusts its transition probability vector in terms of the feedbacks of all data points, and then its transition direction is identified by an event-generating function. Finally, the positions of all data points are updated. As they move in space, data points collect gradually and some separating parts emerge among them automatically. As a consequence, data points that belong to the same class are located at a same position, whereas those that belong to different classes are away from one another. Moreover, the experimental results have demonstrated that data points in the test datasets are clustered reasonably and efficiently, and the comparison with other algorithms also provides an indication of the effectiveness of the proposed algorithms.

التعلم الآلي الذكاء الاصطناعي أنظمة متعددة العملاء

Recombinator-k-means: A population based algorithm that exploits k-means++ for recombination

89 - Carlo Baldassi 2019

We present a simple heuristic algorithm for efficiently optimizing the notoriously hard minimum sum-of-squares clustering problem, usually addressed by the classical k-means heuristic and its variants. The algorithm, called recombinator-k-means, is v ery similar to a genetic algorithmic scheme: it uses populations of configurations, that are optimized independently in parallel and then recombined in a next-iteration population batch by exploiting a variant of the k-means++ seeding algorithm. An additional reweighting mechanism ensures that the population eventually coalesces into a single solution. Extensive tests measuring optimization objective vs computational time on synthetic and real-word data show that it is the only choice, among state-of-the-art alternatives (simple restarts, random swap, genetic algorithm with pairwise-nearest-neighbor crossover), that consistently produces good results at all time scales, outperforming competitors on large and complicated datasets. The only parameter that requires tuning is the population size. The scheme is rather general (it could be applied even to k-medians or k-medoids, for example). Our implementation is publicly available at https://github.com/carlobaldassi/RecombinatorKMeans.jl.

التعلم الآلي التعلم الالي