ترغب بنشر مسار تعليمي؟ اضغط هنا

How to Design Robust Algorithms using Noisy Comparison Oracle

101   0   0.0 ( 0 )
 نشر من قبل Sainyam Galhotra
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as $k$-center clustering and agglomerative hierarchical clustering. These techniques crucially rely on accurate estimation of pairwise distance between records. However, computing exact features of the records, and their pairwise distances is often challenging, and sometimes not possible. We circumvent this challenge by leveraging weak supervision in the form of a comparison oracle that compares the relative distance between the queried points such as `Is point u closer to v or w closer to x?. However, it is possible that some queries are easier to answer than others using a comparison oracle. We capture this by introducing two different noise models called adversarial and probabilistic noise. In this paper, we study various problems that include finding maximum, nearest/farthest neighbor search under these noise models. Building upon the techniques we develop for these comparison operations, we give robust algorithms for k-center clustering and agglomerative hierarchical clustering. We prove that our algorithms achieve good approximation guarantees with a high probability and analyze their query complexity. We evaluate the effectiveness and efficiency of our techniques empirically on various real-world datasets.

قيم البحث

اقرأ أيضاً

Motivated by crowdsourced computation, peer-grading, and recommendation systems, Braverman, Mao and Weinberg [STOC16] studied the emph{query} and emph{round} complexity of fundamental problems such as finding the maximum (textsc{max}), finding all el ements above a certain value (textsc{threshold-$v$}) or computing the top$-k$ elements (textsc{Top}-$k$) in a noisy environment. For example, consider the task of selecting papers for a conference. This task is challenging due the crowdsourcing nature of peer reviews: the results of reviews are noisy and it is necessary to parallelize the review process as much as possible. We study the noisy value model and the noisy comparison model: In the emph{noisy value model}, a reviewer is asked to evaluate a single element: What is the value of paper $i$? (eg accept). In the emph{noisy comparison model} (introduced in the seminal work of Feige, Peleg, Raghavan and Upfal [SICOMP94]) a reviewer is asked to do a pairwise comparison: Is paper $i$ better than paper $j$? In this paper, we show optimal worst-case query complexity for the textsc{max},textsc{threshold-$v$} and textsc{Top}-$k$ problems. For textsc{max} and textsc{Top}-$k$, we obtain optimal worst-case upper and lower bounds on the round vs query complexity in both models. For textsc{threshold}-$v$, we obtain optimal query complexity and nearly-optimal round complexity, where $k$ is the size of the output) for both models. We then go beyond the worst-case and address the question of the importance of knowledge of the instance by providing, for a large range of parameters, instance-optimal algorithms with respect to the query complexity. Furthermore, we show that the value model is strictly easier than the comparison model.
The noisy broadcast model was first studied in [Gallager, TranInf88] where an $n$-character input is distributed among $n$ processors, so that each processor receives one input bit. Computation proceeds in rounds, where in each round each processor b roadcasts a single character, and each reception is corrupted independently at random with some probability $p$. [Gallager, TranInf88] gave an algorithm for all processors to learn the input in $O(loglog n)$ rounds with high probability. Later, a matching lower bound of $Omega(loglog n)$ was given in [Goyal, Kindler, Saks; SICOMP08]. We study a relaxed version of this model where each reception is erased and replaced with a `? independently with probability $p$. In this relaxed model, we break past the lower bound of [Goyal, Kindler, Saks; SICOMP08] and obtain an $O(log^* n)$-round algorithm for all processors to learn the input with high probability. We also show an $O(1)$-round algorithm for the same problem when the alphabet size is $Omega(mathrm{poly}(n))$.
Recently, great efforts have been dedicated to researches on the management of large scale graph based data such as WWW, social networks, biological networks. In the study of graph based data management, node disjoint subgraph homeomorphism relation between graphs is more suitable than (sub)graph isomorphism in many cases, especially in those cases that node skipping and node mismatching are allowed. However, no efficient node disjoint subgraph homeomorphism determination (ndSHD) algorithms have been available. In this paper, we propose two computationally efficient ndSHD algorithms based on state spaces searching with backtracking, which employ many heuristics to prune the search spaces. Experimental results on synthetic data sets show that the proposed algorithms are efficient, require relative little time in most of the testing cases, can scale to large or dense graphs, and can accommodate to more complex fuzzy matching cases.
Given an undirected graph, $G$, and vertices, $s$ and $t$ in $G$, the tracking paths problem is that of finding the smallest subset of vertices in $G$ whose intersection with any $s$-$t$ path results in a unique sequence. This problem is known to be NP-complete and has applications to animal migration tracking and detecting marathon course-cutting, but its approximability is largely unknown. In this paper, we address this latter issue, giving novel algorithms having approximation ratios of $(1+epsilon)$, $O(lg OPT)$ and $O(lg n)$, for $H$-minor-free, general, and weighted graphs, respectively. We also give a linear kernel for $H$-minor-free graphs and make improvements to the quadratic kernel for general graphs.
We initiate a study of algorithms with a focus on the computational complexity of individual elements, and introduce the fragile complexity of comparison-based algorithms as the maximal number of comparisons any individual element takes part in. We g ive a number of upper and lower bounds on the fragile complexity for fundamental problems, including Minimum, Selection, Sorting and Heap Construction. The results include both deterministic and randomized upper and lower bounds, and demonstrate a separation between the two settings for a number of problems. The depth of a comparator network is a straight-forward upper bound on the worst case fragile complexity of the corresponding fragile algorithm. We prove that fragile complexity is a different and strictly easier property than the depth of comparator networks, in the sense that for some problems a fragile complexity equal to the best network depth can be achieved with less total work and that with randomization, even a lower fragile complexity is possible.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا