ترغب بنشر مسار تعليمي؟ اضغط هنا

Nearest-Neighbour-Induced Isolation Similarity and its Impact on Density-Based Clustering

250   0   0.0 ( 0 )
 نشر من قبل Ye Zhu PhD
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on density-based clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.



قيم البحث

اقرأ أيضاً

Data clustering with uneven distribution in high level noise is challenging. Currently, HDBSCAN is considered as the SOTA algorithm for this problem. In this paper, we propose a novel clustering algorithm based on what we call graph of density topolo gy (GDT). GDT jointly considers the local and global structures of data samples: firstly forming local clusters based on a density growing process with a strategy for properly noise handling as well as cluster boundary detection; and then estimating a GDT from relationship between local clusters in terms of a connectivity measure, givingglobal topological graph. The connectivity, measuring similarity between neighboring local clusters, is based on local clusters rather than individual points, ensuring its robustness to even very large noise. Evaluation results on both toy and real-world datasets show that GDT achieves the SOTA performance by far on almost all the popular datasets, and has a low time complexity of O(nlogn). The code is available at https://github.com/gaozhangyang/DGC.git.
Explaining the predictions made by complex machine learning models helps users to understand and accept the predicted outputs with confidence. One promising way is to use similarity-based explanation that provides similar instances as evidence to sup port model predictions. Several relevance metrics are used for this purpose. In this study, we investigated relevance metrics that can provide reasonable explanations to users. Specifically, we adopted three tests to evaluate whether the relevance metrics satisfy the minimal requirements for similarity-based explanation. Our experiments revealed that the cosine similarity of the gradients of the loss performs best, which would be a recommended choice in practice. In addition, we showed that some metrics perform poorly in our tests and analyzed the reasons of their failure. We expect our insights to help practitioners in selecting appropriate relevance metrics and also aid further researches for designing better relevance metrics for explanations.
This paper develops results for the next nearest neighbour Ising model on random graphs. Besides being an essential ingredient in classic models for frustrated systems, second neighbour interactions interactions arise naturally in several application s such as the colour diversity problem and graphical games. We demonstrate ensembles of random graphs, including regular connectivity graphs, that have a periodic variation of free energy, with either the ratio of nearest to next nearest couplings, or the mean number of nearest neighbours. When the coupling ratio is integer paramagnetic phases can be found at zero temperature. This is shown to be related to the locked or unlocked nature of the interactions. For anti-ferromagnetic couplings, spin glass phases are demonstrated at low temperature. The interaction structure is formulated as a factor graph, the solution on a tree is developed. The replica symmetric and energetic one-step replica symmetry breaking solution is developed using the cavity method. We calculate within these frameworks the phase diagram and demonstrate the existence of dynamical transitions at zero temperature for cases of anti-ferromagnetic coupling on regular and inhomogeneous random graphs.
Proposed in 1991, Least Mean Square Error Reconstruction for self-organizing network, shortly Lmser, was a further development of the traditional auto-encoder (AE) by folding the architecture with respect to the central coding layer and thus leading to the features of symmetric weights and neurons, as well as jointly supervised and unsupervised learning. However, its advantages were only demonstrated in a one-hidden-layer implementation due to the lack of computing resources and big data at that time. In this paper, we revisit Lmser from the perspective of deep learning, develop Lmser network based on multiple convolutional layers, which is more suitable for image-related tasks, and confirm several Lmser functions with preliminary demonstrations on image recognition, reconstruction, association recall, and so on. Experiments demonstrate that Lmser indeed works as indicated in the original paper, and it has promising performance in various applications.
We are interested in the problem of finding $k$ nearest neighbours in the plane and in the presence of polygonal obstacles ($textit{OkNN}$). Widely used algorithms for OkNN are based on incremental visibility graphs, which means they require costly a nd online visibility checking and have worst-case quadratic running time. Recently $mathbf{Polyanya}$, a fast point-to-point pathfinding algorithm was proposed which avoids the disadvantages of visibility graphs by searching over an alternative data structure known as a navigation mesh. Previously, we adapted $mathbf{Polyanya}$ to multi-target scenarios by developing two specialised heuristic functions: the $mathbf{Interval heuristic}$ $h_v$ and the $mathbf{Target heuristic}$ $h_t$. Though these methods outperform visibility graph algorithms by orders of magnitude in all our experiments they are not robust: $h_v$ expands many redundant nodes when the set of neighbours is small while $h_t$ performs poorly when the set of neighbours is large. In this paper, we propose new algorithms and heuristics for OkNN which perform well regardless of neighbour density.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا