On Pairwise Clustering with Side Information

121 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mark Herbster

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Stephen Pasteris - Fabio Vitale - Claudio Gentile

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Pairwise clustering, in general, partitions a set of items via a known similarity function. In our treatment, clustering is modeled as a transductive prediction problem. Thus rather than beginning with a known similarity function, the function instead is hidden and the learner only receives a random sample consisting of a subset of the pairwise similarities. An additional set of pairwise side-information may be given to the learner, which then determines the inductive bias of our algorithms. We measure performance not based on the recovery of the hidden similarity function, but instead on how well we classify each item. We give tight bounds on the number of misclassifications. We provide two algorithms. The first algorithm SACA is a simple agglomerative clustering algorithm which runs in near linear time, and which serves as a baseline for our analyses. Whereas the second algorithm, RGCA, enables the incorporation of side-information which may lead to improved bounds at the cost of a longer running time.

قيم البحث

93 - Arya Mazumdar , Barna Saha 2017

Suppose, we are given a set of $n$ elements to be clustered into $k$ (unknown) clusters, and an oracle/expert labeler that can interactively answer pair-wise queries of the form, do two elements $u$ and $v$ belong to the same cluster?. The goal is to recover the optimum clustering by asking the minimum number of queries. In this paper, we initiate a rigorous theoretical study of this basic problem of query complexity of interactive clustering, and provide strong information theoretic lower bounds, as well as nearly matching upper bounds. Most clustering problems come with a similarity matrix, which is used by an automated process to cluster similar points together. Our main contribution in this paper is to show the dramatic power of side information aka similarity matrix on reducing the query complexity of clustering. A similarity matrix represents noisy pair-wise relationships such as one computed by some function on attributes of the elements. A natural noisy model is where similarity values are drawn independently from some arbitrary probability distribution $f_+$ when the underlying pair of elements belong to the same cluster, and from some $f_-$ otherwise. We show that given such a similarity matrix, the query complexity reduces drastically from $Theta(nk)$ (no similarity matrix) to $O(frac{k^2log{n}}{cH^2(f_+|f_-)})$ where $cH^2$ denotes the squared Hellinger divergence. Moreover, this is also information-theoretic optimal within an $O(log{n})$ factor. Our algorithms are all efficient, and parameter free, i.e., they work without any knowledge of $k, f_+$ and $f_-$, and only depend logarithmically with $n$. Along the way, our work also reveals intriguing connection to popular community detection models such as the {em stochastic block model}, significantly generalizes them, and opens up many venues for interesting future research.

التعلم الالي بنى وهياكل البيانات والخوارزميات نظرية المعلومات

Fuzzy Discriminant Clustering with Fuzzy Pairwise Constraints

78 - Zhen Wang , Shan-Shan Wang , Lan Bai 2021

In semi-supervised fuzzy clustering, this paper extends the traditional pairwise constraint (i.e., must-link or cannot-link) to fuzzy pairwise constraint. The fuzzy pairwise constraint allows a supervisor to provide the grade of similarity or dissimi larity between the implicit fuzzy vectors of a pair of samples. This constraint can present more complicated relationship between the pair of samples and avoid eliminating the fuzzy characteristics. We propose a fuzzy discriminant clustering model (FDC) to fuse the fuzzy pairwise constraints. The nonconvex optimization problem in our FDC is solved by a modified expectation-maximization algorithm, involving to solve several indefinite quadratic programming problems (IQPPs). Further, a diagonal block coordinate decent (DBCD) algorithm is proposed for these IQPPs, whose stationary points are guaranteed, and the global solutions can be obtained under certain conditions. To suit for different applications, the FDC is extended into various metric spaces, e.g., the Reproducing Kernel Hilbert Space. Experimental results on several benchmark datasets and facial expression database demonstrate the outperformance of our FDC compared with some state-of-the-art clustering models.

التعلم الآلي

Online Matrix Completion with Side Information

120 - Mark Herbster , Stephen Pasteris , Lisa Tse 2019

We give an online algorithm and prove novel mistake and regret bounds for online binary matrix completion with side information. The mistake bounds we prove are of the form $tilde{O}(D/gamma^2)$. The term $1/gamma^2$ is analogous to the usual margin term in SVM (perceptron) bounds. More specifically, if we assume that there is some factorization of the underlying $m times n$ matrix into $P Q^intercal$ where the rows of $P$ are interpreted as classifiers in $mathcal{R}^d$ and the rows of $Q$ as instances in $mathcal{R}^d$, then $gamma$ is the maximum (normalized) margin over all factorizations $P Q^intercal$ consistent with the observed matrix. The quasi-dimension term $D$ measures the quality of side information. In the presence of vacuous side information, $D= m+n$. However, if the side information is predictive of the underlying factorization of the matrix, then in an ideal case, $D in O(k + ell)$ where $k$ is the number of distinct row factors and $ell$ is the number of distinct column factors. We additionally provide a generalization of our algorithm to the inductive setting. In this setting, we provide an example where the side information is not directly specified in advance. For this example, the quasi-dimension $D$ is now bounded by $O(k^2 + ell^2)$.

التعلم الآلي التعلم الالي

Patterns for Learning with Side Information

218 - Rico Jonschkowski , Sebastian Hofer , Oliver Brock 2015

Supervised, semi-supervised, and unsupervised learning estimate a function given input/output samples. Generalization of the learned function to unseen data can be improved by incorporating side information into learning. Side information are data th at are neither from the input space nor from the output space of the function, but include useful information for learning it. In this paper we show that learning with side information subsumes a variety of related approaches, e.g. multi-task learning, multi-view learning and learning using privileged information. Our main contributions are (i) a new perspective that connects these previously isolated approaches, (ii) insights about how these methods incorporate different types of prior knowledge, and hence implement different patterns, (iii) facilitating the application of these methods in novel tasks, as well as (iv) a systematic experimental evaluation of these patterns in two supervised learning tasks.

التعلم الآلي التعلم الالي

Testing Randomization and Relaxed Randomization Assumptions: A Clustering With Side-information Approach

89 - Kan Chen , Siyu Heng , Qi Long 2021

One central goal of design of observational studies is to embed non-experimental data into an approximate randomized controlled trial using statistical matching. Researchers then make the randomization assumption in their downstream, outcome analysis . For matched pair design, the randomization assumption states that the treatment assignment across all matched pairs are independent, and that the probability of the first subject in each pair receiving treatment and the other control is the same as the first receiving control and the other treatment. In this article, we develop a novel framework for testing the randomization assumption based on solving a clustering problem with side-information using modern statistical learning tools. Our testing framework is nonparametric, finite-sample exact, and distinct from previous proposals in that it can be used to test a relaxed version of the randomization assumption called the biased randomization assumption. One important by-product of our testing framework is a quantity called residual sensitivity value (RSV), which quantifies the level of minimal residual confounding due to observed covariates not being well matched. We advocate taking into account RSV in the downstream primary analysis. The proposed methodology is illustrated by re-examining a famous observational study concerning the effect of right heart catheterization (RHC) in the initial care of critically ill patients.

المنهجية