Nearest-Neighbour-Induced Isolation Similarity and its Impact on Density-Based Clustering

250 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ye Zhu PhD

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Xiaoyu Qin - Kai Ming Ting - Ye Zhu

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on density-based clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.

قيم البحث

84 - Zhangyang Gao , Haitao Lin , Stan. Z Li 2020

Data clustering with uneven distribution in high level noise is challenging. Currently, HDBSCAN is considered as the SOTA algorithm for this problem. In this paper, we propose a novel clustering algorithm based on what we call graph of density topolo gy (GDT). GDT jointly considers the local and global structures of data samples: firstly forming local clusters based on a density growing process with a strategy for properly noise handling as well as cluster boundary detection; and then estimating a GDT from relationship between local clusters in terms of a connectivity measure, givingglobal topological graph. The connectivity, measuring similarity between neighboring local clusters, is based on local clusters rather than individual points, ensuring its robustness to even very large noise. Evaluation results on both toy and real-world datasets show that GDT achieves the SOTA performance by far on almost all the popular datasets, and has a low time complexity of O(nlogn). The code is available at https://github.com/gaozhangyang/DGC.git.

التعلم الآلي التعلم الالي

Evaluation of Similarity-based Explanations

67 - Kazuaki Hanawa , Sho Yokoi , Satoshi Hara 2020

Explaining the predictions made by complex machine learning models helps users to understand and accept the predicted outputs with confidence. One promising way is to use similarity-based explanation that provides similar instances as evidence to sup port model predictions. Several relevance metrics are used for this purpose. In this study, we investigated relevance metrics that can provide reasonable explanations to users. Specifically, we adopted three tests to evaluate whether the relevance metrics satisfy the minimal requirements for similarity-based explanation. Our experiments revealed that the cosine similarity of the gradients of the loss performs best, which would be a recommended choice in practice. In addition, we showed that some metrics perform poorly in our tests and analyzed the reasons of their failure. We expect our insights to help practitioners in selecting appropriate relevance metrics and also aid further researches for designing better relevance metrics for explanations.

التعلم الآلي التعلم الالي

Next nearest neighbour Ising models on random graphs

632 - Jack Raymond , K. Y. Michael Wong 2012

This paper develops results for the next nearest neighbour Ising model on random graphs. Besides being an essential ingredient in classic models for frustrated systems, second neighbour interactions interactions arise naturally in several application s such as the colour diversity problem and graphical games. We demonstrate ensembles of random graphs, including regular connectivity graphs, that have a periodic variation of free energy, with either the ratio of nearest to next nearest couplings, or the mean number of nearest neighbours. When the coupling ratio is integer paramagnetic phases can be found at zero temperature. This is shown to be related to the locked or unlocked nature of the interactions. For anti-ferromagnetic couplings, spin glass phases are demonstrated at low temperature. The interaction structure is formulated as a factor graph, the solution on a tree is developed. The replica symmetric and energetic one-step replica symmetry breaking solution is developed using the cavity method. We calculate within these frameworks the phase diagram and demonstrate the existence of dynamical transitions at zero temperature for cases of anti-ferromagnetic coupling on regular and inhomogeneous random graphs.

الميكانيكا الإحصائية الأنظمة المضطربة والشبكات العصبية

Revisit Lmser and its further development based on convolutional layers

94 - Wenjing Huang , Shikui Tu , Lei Xu 2019

Proposed in 1991, Least Mean Square Error Reconstruction for self-organizing network, shortly Lmser, was a further development of the traditional auto-encoder (AE) by folding the architecture with respect to the central coding layer and thus leading to the features of symmetric weights and neurons, as well as jointly supervised and unsupervised learning. However, its advantages were only demonstrated in a one-hidden-layer implementation due to the lack of computing resources and big data at that time. In this paper, we revisit Lmser from the perspective of deep learning, develop Lmser network based on multiple convolutional layers, which is more suitable for image-related tasks, and confirm several Lmser functions with preliminary demonstrations on image recognition, reconstruction, association recall, and so on. Experiments demonstrate that Lmser indeed works as indicated in the original paper, and it has promising performance in various applications.

التعلم الآلي التعلم الالي

Faster and More Robust Mesh-based Algorithms for Obstacle k-Nearest Neighbour

55 - Shizhe Zhao , Daniel D. Harabor , David Taniar 2018

We are interested in the problem of finding $k$ nearest neighbours in the plane and in the presence of polygonal obstacles ($textit{OkNN}$). Widely used algorithms for OkNN are based on incremental visibility graphs, which means they require costly a nd online visibility checking and have worst-case quadratic running time. Recently $mathbf{Polyanya}$, a fast point-to-point pathfinding algorithm was proposed which avoids the disadvantages of visibility graphs by searching over an alternative data structure known as a navigation mesh. Previously, we adapted $mathbf{Polyanya}$ to multi-target scenarios by developing two specialised heuristic functions: the $mathbf{Interval heuristic}$ $h_v$ and the $mathbf{Target heuristic}$ $h_t$. Though these methods outperform visibility graph algorithms by orders of magnitude in all our experiments they are not robust: $h_v$ expands many redundant nodes when the set of neighbours is small while $h_t$ performs poorly when the set of neighbours is large. In this paper, we propose new algorithms and heuristics for OkNN which perform well regardless of neighbour density.

الذكاء الاصطناعي