مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Near Neighbor Search via Efficient Average Distortion Embeddings

171 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Aleksandar Nikolov

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Deepanshu Kush - Aleksandar Nikolov - Haohua Tang

بنى وهياكل البيانات والخوارزميات الهندسة الحسابية هندسة القياسات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A recent series of papers by Andoni, Naor, Nikolov, Razenshteyn, and Waingarten (STOC 2018, FOCS 2018) has given approximate near neighbour search (NNS) data structures for a wide class of distance metrics, including all norms. In particular, these data structures achieve approximation on the order of $p$ for $ell_p^d$ norms with space complexity nearly linear in the dataset size $n$ and polynomial in the dimension $d$, and query time sub-linear in $n$ and polynomial in $d$. The main shortcoming is the exponential in $d$ pre-processing time required for their construction. In this paper, we describe a more direct framework for constructing NNS data structures for general norms. More specifically, we show via an algorithmic reduction that an efficient NNS data structure for a given metric is implied by an efficient average distortion embedding of it into $ell_1$ or into Euclidean space. In particular, the resulting data structures require only polynomial pre-processing time, as long as the embedding can be computed in polynomial time. As a concrete instantiation of this framework, we give an NNS data structure for $ell_p$ with efficient pre-processing that matches the approximation factor, space and query complexity of the aforementioned data structure of Andoni et al. On the way, we resolve a question of Naor (Analysis and Geometry in Metric Spaces, 2014) and provide an explicit, efficiently computable embedding of $ell_p$, for $p ge 2$, into $ell_2$ with (quadratic) average distortion on the order of $p$. We expect our approach to pave the way for constructing efficient NNS data structures for all norms.

قيم البحث

165 - Joshua Engels , Benjamin Coleman , Anshumali Shrivastava 2021

We present a new algorithm for the approximate near neighbor problem that combines classical ideas from group testing with locality-sensitive hashing (LSH). We reduce the near neighbor search problem to a group testing problem by designating neighbor s as positives, non-neighbors as negatives, and approximate membership queries as group tests. We instantiate this framework using distance-sensitive Bloom Filters to Identify Near-Neighbor Groups (FLINNG). We prove that FLINNG has sub-linear query time and show that our algorithm comes with a variety of practical advantages. For example, FLINNG can be constructed in a single pass through the data, consists entirely of efficient integer operations, and does not require any distance computations. We conduct large-scale experiments on high-dimensional search tasks such as genome search, URL similarity search, and embedding search over the massive YFCC100M dataset. In our comparison with leading algorithms such as HNSW and FAISS, we find that FLINNG can provide up to a 10x query speedup with substantially smaller indexing time and memory.

بنى وهياكل البيانات والخوارزميات

Graph Reordering for Cache-Efficient Near Neighbor Search

237 - Benjamin Coleman , Santiago Segarra , Anshumali Shrivastava 2021

Graph search is one of the most successful algorithmic trends in near neighbor search. Several of the most popular and empirically successful algorithms are, at their core, a simple walk along a pruned near neighbor graph. Such algorithms consistentl y perform at the top of industrial speed benchmarks for applications such as embedding search. However, graph traversal applications often suffer from poor memory access patterns, and near neighbor search is no exception to this rule. Our measurements show that popular search indices such as the hierarchical navigable small-world graph (HNSW) can have poor cache miss performance. To address this problem, we apply graph reordering algorithms to near neighbor graphs. Graph reordering is a memory layout optimization that groups commonly-accessed nodes together in memory. We present exhaustive experiments applying several reordering algorithms to a leading graph-based near neighbor method based on the HNSW index. We find that reordering improves the query time by up to 40%, and we demonstrate that the time needed to reorder the graph is negligible compared to the time required to construct the index.

بنى وهياكل البيانات والخوارزميات

Nearest Neighbor Search for Hyperbolic Embeddings

209 - Xian Wu , Moses Charikar 2020

Embedding into hyperbolic space is emerging as an effective representation technique for datasets that exhibit hierarchical structure. This development motivates the need for algorithms that are able to effectively extract knowledge and insights from datapoints embedded in negatively curved spaces. We focus on the problem of nearest neighbor search, a fundamental problem in data analysis. We present efficient algorithmic solutions that build upon established methods for nearest neighbor search in Euclidean space, allowing for easy adoption and integration with existing systems. We prove theoretical guarantees for our techniques and our experiments demonstrate the effectiveness of our approach on real datasets over competing algorithms.

بنى وهياكل البيانات والخوارزميات

Lower Bounds for Oblivious Near-Neighbor Search

101 - Kasper Green Larsen , Tal Malkin , Omri Weinstein 2019

We prove an $Omega(d lg n/ (lglg n)^2)$ lower bound on the dynamic cell-probe complexity of statistically $mathit{oblivious}$ approximate-near-neighbor search ($mathsf{ANN}$) over the $d$-dimensional Hamming cube. For the natural setting of $d = Thet a(log n)$, our result implies an $tilde{Omega}(lg^2 n)$ lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower bound for $mathsf{ANN}$. This is the first super-logarithmic $mathit{unconditional}$ lower bound for $mathsf{ANN}$ against general (non black-box) data structures. We also show that any oblivious $mathit{static}$ data structure for decomposable search problems (like $mathsf{ANN}$) can be obliviously dynamized with $O(log n)$ overhead in update and query time, strengthening a classic result of Bentley and Saxe (Algorithmica, 1980).

بنى وهياكل البيانات والخوارزميات التشفير والأمن

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

76 - Elias Jaasaari , Ville Hyvonen , Teemu Roos 2018

Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy--speed trade -off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الحواش الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Near Neighbor Search via Efficient Average Distortion Embeddings

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً