Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

77 0 0.0 ( 0 )

Download Cite

Added by Ville Hyv\\\"onen

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Elias Jaasaari - Ville Hyvonen - Teemu Roos

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy--speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.

rate research

Nearest Neighbor Search for Hyperbolic Embeddings

209 - Xian Wu , Moses Charikar 2020

Embedding into hyperbolic space is emerging as an effective representation technique for datasets that exhibit hierarchical structure. This development motivates the need for algorithms that are able to effectively extract knowledge and insights from datapoints embedded in negatively curved spaces. We focus on the problem of nearest neighbor search, a fundamental problem in data analysis. We present efficient algorithmic solutions that build upon established methods for nearest neighbor search in Euclidean space, allowing for easy adoption and integration with existing systems. We prove theoretical guarantees for our techniques and our experiments demonstrate the effectiveness of our approach on real datasets over competing algorithms.

Data Structures and Algorithms

Approximate Nearest Neighbor for Curves -- Simple, Efficient, and Deterministic

128 - Arnold Filtser , Omrit Filtser , Matthew J. Katz 2019

In the $(1+varepsilon,r)$-approximate near-neighbor problem for curves (ANNC) under some distance measure $delta$, the goal is to construct a data structure for a given set $mathcal{C}$ of curves that supports approximate near-neighbor queries: Given a query curve $Q$, if there exists a curve $Cinmathcal{C}$ such that $delta(Q,C)le r$, then return a curve $Cinmathcal{C}$ with $delta(Q,C)le(1+varepsilon)r$. There exists an efficient reduction from the $(1+varepsilon)$-approximate nearest-neighbor problem to ANNC, where in the former problem the answer to a query is a curve $Cinmathcal{C}$ with $delta(Q,C)le(1+varepsilon)cdotdelta(Q,C^*)$, where $C^*$ is the curve of $mathcal{C}$ closest to $Q$. Given a set $mathcal{C}$ of $n$ curves, each consisting of $m$ points in $d$ dimensions, we construct a data structure for ANNC that uses $ncdot O(frac{1}{varepsilon})^{md}$ storage space and has $O(md)$ query time (for a query curve of length $m$), where the similarity between two curves is their discrete Frechet or dynamic time warping distance. Our method is simple to implement, deterministic, and results in an exponential improvement in both query time and storage space compared to all previous bounds. Further, we also consider the asymmetric version of ANNC, where the length of the query curves is $k ll m$, and obtain essentially the same storage and query bounds as above, except that $m$ is replaced by $k$. Finally, we apply our method to a version of approximate range counting for curves and achieve similar bounds.

Computational Geometry

Compression Network with Transformer for Approximate Nearest Neighbor Search

426 - Haokui Zhang , Wenze Hu , Buzhou Tang 2021

We propose a generic feature compression method for Approximate Nearest Neighbor Search (ANNS) problems, which speeds up existing ANNS methods in a plug-and-play manner. Specifically, we propose a new network structure called Compression Network with Transformer (CNT) to compress the feature into a low dimensional space, and an inhomogeneous neighborhood relationship preserving (INRP) loss that aims to maintain high search accuracy. In CNT, we use multiple compression projections to cast the feature into many low dimensional spaces, and then use transformer to globally optimize these projections such that the features are well compressed following the guidance from our loss function. The loss function is designed to assign high weights on point pairs that are close in original feature space, and keep their distances in projected space. Keeping these distances helps maintain the eventual top-k retrieval accuracy, and down weighting others creates room for feature compression. In experiments, we run our compression method on public datasets, and use the compressed features in graph based, product quantization and scalar quantization based ANNS solutions. Experimental results show that our compression method can significantly improve the efficiency of these methods while preserves or even improves search accuracy, suggesting its broad potential impact on real world applications.

Information Retrieval

Approximate Nearest Neighbor Search as a Multi-Label Classification Problem

248 - Ville Hyvonen , Elias Jaasaari , Teemu Roos 2019

We formulate approximate nearest neighbor (ANN) search as a multi-label classification task. The implications are twofold. First, tree-based indexes can be searched more efficiently by interpreting them as models to solve this task. Second, in addition to index structures designed specifically for ANN search, any type of classifier can be used as an index.

Machine Learning Machine Learning

Approximate Message Passing with Nearest Neighbor Sparsity Pattern Learning

387 - Xiangming Meng , Sheng Wu , Linling Kuang 2016

We consider the problem of recovering clustered sparse signals with no prior knowledge of the sparsity pattern. Beyond simple sparsity, signals of interest often exhibits an underlying sparsity pattern which, if leveraged, can improve the reconstruction performance. However, the sparsity pattern is usually unknown a priori. Inspired by the idea of k-nearest neighbor (k-NN) algorithm, we propose an efficient algorithm termed approximate message passing with nearest neighbor sparsity pattern learning (AMP-NNSPL), which learns the sparsity pattern adaptively. AMP-NNSPL specifies a flexible spike and slab prior on the unknown signal and, after each AMP iteration, sets the sparse ratios as the average of the nearest neighbor estimates via expectation maximization (EM). Experimental results on both synthetic and real data demonstrate the superiority of our proposed algorithm both in terms of reconstruction performance and computational complexity.

Information Theory Machine Learning Information Theory

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions