Spatially Constrained Spectral Clustering Algorithms for Region Delineation

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shuai Yuan

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Shuai Yuan - Pang-Ning Tan - Kendra Spence Cheruvelil

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Regionalization is the task of dividing up a landscape into homogeneous patches with similar properties. Although this task has a wide range of applications, it has two notable challenges. First, it is assumed that the resulting regions are both homogeneous and spatially contiguous. Second, it is well-recognized that landscapes are hierarchical such that fine-scale regions are nested wholly within broader-scale regions. To address these two challenges, first, we develop a spatially constrained spectral clustering framework for region delineation that incorporates the tradeoff between region homogeneity and spatial contiguity. The framework uses a flexible, truncated exponential kernel to represent the spatial contiguity constraints, which is integrated with the landscape feature similarity matrix for region delineation. To address the second challenge, we extend the framework to create fine-scale regions that are nested within broader-scaled regions using a greedy, recursive bisection approach. We present a case study of a terrestrial ecology data set in the United States that compares the proposed framework with several baseline methods for regionalization. Experimental results suggest that the proposed framework for regionalization outperforms the baseline methods, especially in terms of balancing region contiguity and homogeneity, as well as creating regions of more similar size, which is often a desired trait of regions.

قيم البحث

81 - Yan Ge , Haiping Lu , Pan Peng 2018

Clustering is fundamental for gaining insights from complex networks, and spectral clustering (SC) is a popular approach. Conventional SC focuses on second-order structures (e.g., edges connecting two nodes) without direct consideration of higher-ord er structures (e.g., triangles and cliques). This has motivated SC extensions that directly consider higher-order structures. However, both approaches are limited to considering a single order. This paper proposes a new Mixed-Order Spectral Clustering (MOSC) approach to model both second-order and third-order structures simultaneously, with two MOSC methods developed based on Graph Laplacian (GL) and Random Walks (RW). MOSC-GL combines edge and triangle adjacency matrices, with theoretical performance guarantee. MOSC-RW combines first-order and second-order random walks for a probabilistic interpretation. We automatically determine the mixing parameter based on cut criteria or triangle density, and construct new structure-aware error metrics for performance evaluation. Experiments on real-world networks show 1) the superior performance of two MOSC methods over existing SC methods, 2) the effectiveness of the mixing parameter determination strategy, and 3) insights offered by the structure-aware error metrics.

التعلم الآلي التعلم الالي

Scalable Spectral Clustering Using Random Binning Features

71 - Lingfei Wu , Pin-Yu Chen , Ian En-Hsu Yen 2018

Spectral clustering is one of the most effective clustering approaches that capture hidden cluster structures in the data. However, it does not scale well to large-scale problems due to its quadratic complexity in constructing similarity graphs and c omputing subsequent eigendecomposition. Although a number of methods have been proposed to accelerate spectral clustering, most of them compromise considerable information loss in the original data for reducing computational bottlenecks. In this paper, we present a novel scalable spectral clustering method using Random Binning features (RB) to simultaneously accelerate both similarity graph construction and the eigendecomposition. Specifically, we implicitly approximate the graph similarity (kernel) matrix by the inner product of a large sparse feature matrix generated by RB. Then we introduce a state-of-the-art SVD solver to effectively compute eigenvectors of this large matrix for spectral clustering. Using these two building blocks, we reduce the computational cost from quadratic to linear in the number of data points while achieving similar accuracy. Our theoretical analysis shows that spectral clustering via RB converges faster to the exact spectral clustering than the standard Random Feature approximation. Extensive experiments on 8 benchmarks show that the proposed method either outperforms or matches the state-of-the-art methods in both accuracy and runtime. Moreover, our method exhibits linear scalability in both the number of data samples and the number of RB features.

التعلم الآلي التعلم الالي

Multi-View Spectral Clustering with High-Order Optimal Neighborhood Laplacian Matrix

156 - Weixuan Liang , Sihang Zhou , Jian Xiong 2020

Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data by performing clustering on the learned optimal embedding across views. Though demonstrating promising performance in various applications, most of exist ing methods usually linearly combine a group of pre-specified first-order Laplacian matrices to construct the optimal Laplacian matrix, which may result in limited representation capability and insufficient information exploitation. Also, storing and implementing complex operations on the $ntimes n$ Laplacian matrices incurs intensive storage and computation complexity. To address these issues, this paper first proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix, and then extends it to the late fusion version for accurate and efficient multi-view clustering. Specifically, our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base Laplacian matrices simultaneously. By this way, the representative capacity of the learned optimal Laplacian matrix is enhanced, which is helpful to better utilize the hidden high-order connection information among data, leading to improved clustering performance. We design an efficient algorithm with proved convergence to solve the resultant optimization problem. Extensive experimental results on nine datasets demonstrate the superiority of our algorithm against state-of-the-art methods, which verifies the effectiveness and advantages of the proposed algorithm.

التعلم الآلي التعلم الالي

Almost Tight Approximation Algorithms for Explainable Clustering

68 - Hossein Esfandiari , Vahab Mirrokni , Shyam Narayanan 2021

Recently, due to an increasing interest for transparency in artificial intelligence, several methods of explainable machine learning have been developed with the simultaneous goal of accuracy and interpretability by humans. In this paper, we study a recent framework of explainable clustering first suggested by Dasgupta et al.~cite{dasgupta2020explainable}. Specifically, we focus on the $k$-means and $k$-medians problems and provide nearly tight upper and lower bounds. First, we provide an $O(log k log log k)$-approximation algorithm for explainable $k$-medians, improving on the best known algorithm of $O(k)$~cite{dasgupta2020explainable} and nearly matching the known $Omega(log k)$ lower bound~cite{dasgupta2020explainable}. In addition, in low-dimensional spaces $d ll log k$, we show that our algorithm also provides an $O(d log^2 d)$-approximate solution for explainable $k$-medians. This improves over the best known bound of $O(d log k)$ for low dimensions~cite{laber2021explainable}, and is a constant for constant dimensional spaces. To complement this, we show a nearly matching $Omega(d)$ lower bound. Next, we study the $k$-means problem in this context and provide an $O(k log k)$-approximation algorithm for explainable $k$-means, improving over the $O(k^2)$ bound of Dasgupta et al. and the $O(d k log k)$ bound of cite{laber2021explainable}. To complement this we provide an almost tight $Omega(k)$ lower bound, improving over the $Omega(log k)$ lower bound of Dasgupta et al. Given an approximate solution to the classic $k$-means and $k$-medians, our algorithm for $k$-medians runs in time $O(kd log^2 k )$ and our algorithm for $k$-means runs in time $ O(k^2 d)$.

التعلم الآلي بنى وهياكل البيانات والخوارزميات

Thompson Sampling for Linearly Constrained Bandits

103 - Vidit Saxena , Joseph E. Gonzalez , 2020

We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under a probabilistic linear constraint. For a few real-world instances of this problem, constrained extensions of the well-known Thompson Sampling (TS) heu ristic have recently been proposed. However, finite-time analysis of constrained TS is challenging; as a result, only O(sqrt{T}) bounds on the cumulative reward loss (i.e., the regret) are available. In this paper, we describe LinConTS, a TS-based algorithm for bandits that place a linear constraint on the probability of earning a reward in every round. We show that for LinConTS, the regret as well as the cumulative constraint violations are upper bounded by O(log T) for the suboptimal arms. We develop a proof technique that relies on careful analysis of the dual problem and combine it with recent theoretical work on unconstrained TS. Through numerical experiments on two real-world datasets, we demonstrate that LinConTS outperforms an asymptotically optimal upper confidence bound (UCB) scheme in terms of simultaneously minimizing the regret and the violation.

التعلم الآلي التعلم الالي