ترغب بنشر مسار تعليمي؟ اضغط هنا

Partitioned $K$-nearest neighbor local depth for scalable comparison-based learning

79   0   0.0 ( 0 )
 نشر من قبل R W R Darling Ph. D.
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

A triplet comparison oracle on a set $S$ takes an object $x in S$ and for any pair ${y, z} subset S setminus {x}$ declares which of $y$ and $z$ is more similar to $x$. Partitioned Local Depth (PaLD) supplies a principled non-parametric partitioning of $S$ under such triplet comparisons but needs $O(n^2 log{n})$ oracle calls and $O(n^3)$ post-processing steps. We introduce Partitioned Nearest Neighbors Local Depth (PaNNLD), a computationally tractable variant of PaLD leveraging the $K$-nearest neighbors digraph on $S$. PaNNLD needs only $O(n K log{n})$ oracle calls, by replacing an oracle call by a coin flip when neither $y$ nor $z$ is adjacent to $x$ in the undirected version of the $K$-nearest neighbors digraph. By averaging over randomizations, PaNNLD subsequently requires (at best) only $O(n K^2)$ post-processing steps. Concentration of measure shows that the probability of randomization-induced error $delta$ in PaNNLD is no more than $2 e^{-delta^2 K^2}$.

قيم البحث

اقرأ أيضاً

209 - Xian Wu , Moses Charikar 2020
Embedding into hyperbolic space is emerging as an effective representation technique for datasets that exhibit hierarchical structure. This development motivates the need for algorithms that are able to effectively extract knowledge and insights from datapoints embedded in negatively curved spaces. We focus on the problem of nearest neighbor search, a fundamental problem in data analysis. We present efficient algorithmic solutions that build upon established methods for nearest neighbor search in Euclidean space, allowing for easy adoption and integration with existing systems. We prove theoretical guarantees for our techniques and our experiments demonstrate the effectiveness of our approach on real datasets over competing algorithms.
We study the problem of sampling an approximately uniformly random satisfying assignment for atomic constraint satisfaction problems i.e. where each constraint is violated by only one assignment to its variables. Let $p$ denote the maximum probabilit y of violation of any constraint and let $Delta$ denote the maximum degree of the line graph of the constraints. Our main result is a nearly-linear (in the number of variables) time algorithm for this problem, which is valid in a Lovasz local lemma type regime that is considerably less restrictive compared to previous works. In particular, we provide sampling algorithms for the uniform distribution on: (1) $q$-colorings of $k$-uniform hypergraphs with $Delta lesssim q^{(k-4)/3 + o_{q}(1)}.$ The exponent $1/3$ improves the previously best-known $1/7$ in the case $q, Delta = O(1)$ [Jain, Pham, Vuong; arXiv, 2020] and $1/9$ in the general case [Feng, He, Yin; STOC 2021]. (2) Satisfying assignments of Boolean $k$-CNF formulas with $Delta lesssim 2^{k/5.741}.$ The constant $5.741$ in the exponent improves the previously best-known $7$ in the case $k = O(1)$ [Jain, Pham, Vuong; arXiv, 2020] and $13$ in the general case [Feng, He, Yin; STOC 2021]. (3) Satisfying assignments of general atomic constraint satisfaction problems with $pcdot Delta^{7.043} lesssim 1.$ The constant $7.043$ improves upon the previously best-known constant of $350$ [Feng, He, Yin; STOC 2021]. At the heart of our analysis is a novel information-percolation type argument for showing the rapid mixing of the Glauber dynamics for a carefully constructed projection of the uniform distribution on satisfying assignments. Notably, there is no natural partial order on the space, and we believe that the techniques developed for the analysis may be of independent interest.
The subject of this paper are operators represented on Fock spaces whose behavior on one level depends only on two of its neighbors. Our initial objective was to generalize (via a common framework) the results of arXiv:math/0702158, arXiv:0709.4334, arXiv:0812.0895, and arXiv:1003.2998, whose constructions exhibited this behavior. We extend a number of results from these papers to our more general setting. These include the quadratic relation satisfied by the free cumulant generating function (actually by a variant of it), the resolvent form of the generating function for the Wick polynomials, and classification results for the case when the vacuum state on the operator algebra is tracial. We are able to handle the generating functions in infinitely many variables by considering their matrix-valu
Let $Phi = (V, mathcal{C})$ be a constraint satisfaction problem on variables $v_1,dots, v_n$ such that each constraint depends on at most $k$ variables and such that each variable assumes values in an alphabet of size at most $[q]$. Suppose that eac h constraint shares variables with at most $Delta$ constraints and that each constraint is violated with probability at most $p$ (under the product measure on its variables). We show that for $k, q = O(1)$, there is a deterministic, polynomial time algorithm to approximately count the number of satisfying assignments and a randomized, polynomial time algorithm to sample from approximately the uniform distribution on satisfying assignments, provided that [Ccdot q^{3}cdot k cdot p cdot Delta^{7} < 1, quad text{where }C text{ is an absolute constant.}] Previously, a result of this form was known essentially only in the special case when each constraint is violated by exactly one assignment to its variables. For the special case of $k$-CNF formulas, the term $Delta^{7}$ improves the previously best known $Delta^{60}$ for deterministic algorithms [Moitra, J.ACM, 2019] and $Delta^{13}$ for randomized algorithms [Feng et al., arXiv, 2020]. For the special case of properly $q$-coloring $k$-uniform hypergraphs, the term $Delta^{7}$ improves the previously best known $Delta^{14}$ for deterministic algorithms [Guo et al., SICOMP, 2019] and $Delta^{9}$ for randomized algorithms [Feng et al., arXiv, 2020].
We give an FPTAS for computing the number of matchings of size $k$ in a graph $G$ of maximum degree $Delta$ on $n$ vertices, for all $k le (1-delta)m^*(G)$, where $delta>0$ is fixed and $m^*(G)$ is the matching number of $G$, and an FPTAS for the num ber of independent sets of size $k le (1-delta) alpha_c(Delta) n$, where $alpha_c(Delta)$ is the NP-hardness threshold for this problem. We also provide quasi-linear time randomized algorithms to approximately sample from the uniform distribution on matchings of size $k leq (1-delta)m^*(G)$ and independent sets of size $k leq (1-delta)alpha_c(Delta)n$. Our results are based on a new framework for exploiting local central limit theorems as an algorithmic tool. We use a combination of Fourier inversion, probabilistic estimates, and the deterministic approximation of partition functions at complex activities to extract approximations of the coefficients of the partition function. For our results for independent sets, we prove a new local central limit theorem for the hard-core model that applies to all fugacities below $lambda_c(Delta)$, the uniqueness threshold on the infinite $Delta$-regular tree.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا