ﻻ يوجد ملخص باللغة العربية
Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as $k$-center clustering and agglomerative hierarchical clustering. These techniques crucially rely on accurate estimation of pairwise distance between records. However, computing exact features of the records, and their pairwise distances is often challenging, and sometimes not possible. We circumvent this challenge by leveraging weak supervision in the form of a comparison oracle that compares the relative distance between the queried points such as `Is point u closer to v or w closer to x?. However, it is possible that some queries are easier to answer than others using a comparison oracle. We capture this by introducing two different noise models called adversarial and probabilistic noise. In this paper, we study various problems that include finding maximum, nearest/farthest neighbor search under these noise models. Building upon the techniques we develop for these comparison operations, we give robust algorithms for k-center clustering and agglomerative hierarchical clustering. We prove that our algorithms achieve good approximation guarantees with a high probability and analyze their query complexity. We evaluate the effectiveness and efficiency of our techniques empirically on various real-world datasets.
Motivated by crowdsourced computation, peer-grading, and recommendation systems, Braverman, Mao and Weinberg [STOC16] studied the emph{query} and emph{round} complexity of fundamental problems such as finding the maximum (textsc{max}), finding all el
The noisy broadcast model was first studied in [Gallager, TranInf88] where an $n$-character input is distributed among $n$ processors, so that each processor receives one input bit. Computation proceeds in rounds, where in each round each processor b
Recently, great efforts have been dedicated to researches on the management of large scale graph based data such as WWW, social networks, biological networks. In the study of graph based data management, node disjoint subgraph homeomorphism relation
Given an undirected graph, $G$, and vertices, $s$ and $t$ in $G$, the tracking paths problem is that of finding the smallest subset of vertices in $G$ whose intersection with any $s$-$t$ path results in a unique sequence. This problem is known to be
We initiate a study of algorithms with a focus on the computational complexity of individual elements, and introduce the fragile complexity of comparison-based algorithms as the maximal number of comparisons any individual element takes part in. We g