بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

HCLAE: High Capacity Locally Aggregating Encodings for Approximate Nearest Neighbor Search

84 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shicong Liu

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shicong Liu - Junru Shao - Hongtao Lu

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Vector quantization-based approaches are successful to solve Approximate Nearest Neighbor (ANN) problems which are critical to many applications. The idea is to generate effective encodings to allow fast distance approximation. We propose quantization-based methods should partition the data space finely and exhibit locality of the dataset to allow efficient non-exhaustive search. In this paper, we introduce the concept of High Capacity Locality Aggregating Encodings (HCLAE) to this end, and propose Dictionary Annealing (DA) to learn HCLAE by a simulated annealing procedure. The quantization error is lower than other state-of-the-art. The algorithms of DA can be easily extended to an online learning scheme, allowing effective handle of large scale data. Further, we propose Aggregating-Tree (A-Tree), a non-exhaustive search method using HCLAE to perform efficient ANN-Search. A-Tree achieves magnitudes of speed-up on ANN-Search tasks, compared to the state-of-the-art.

قيم البحث

72 - Shicong Liu , Hongtao Lu , Junru Shao 2015

Quantization methods have been introduced to perform large scale approximate nearest search tasks. Residual Vector Quantization (RVQ) is one of the effective quantization methods. RVQ uses a multi-stage codebook learning scheme to lower the quantizat ion error stage by stage. However, there are two major limitations for RVQ when applied to on high-dimensional approximate nearest neighbor search: 1. The performance gain diminishes quickly with added stages. 2. Encoding a vector with RVQ is actually NP-hard. In this paper, we propose an improved residual vector quantization (IRVQ) method, our IRVQ learns codebook with a hybrid method of subspace clustering and warm-started k-means on each stage to prevent performance gain from dropping, and uses a multi-path encoding scheme to encode a vector with lower distortion. Experimental results on the benchmark datasets show that our method gives substantially improves RVQ and delivers better performance compared to the state-of-the-art.

الرؤية الحاسوبية وتمييز الأنماط

Learning Better Encoding for Approximate Nearest Neighbor Search with Dictionary Annealing

457 - Shicong Liu , Hongtao Lu 2015

We introduce a novel dictionary optimization method for high-dimensional vector quantization employed in approximate nearest neighbor (ANN) search. Vector quantization methods first seek a series of dictionaries, then approximate each vector by a sum of elements selected from these dictionaries. An optimal series of dictionaries should be mutually independent, and each dictionary should generate a balanced encoding for the target dataset. Existing methods did not explicitly consider this. To achieve these goals along with minimizing the quantization error (residue), we propose a novel dictionary optimization method called emph{Dictionary Annealing} that alternatively heats up a single dictionary by generating an intermediate dataset with residual vectors, cools down the dictionary by fitting the intermediate dataset, then extracts the new residual vectors for the next iteration. Better codes can be learned by DA for the ANN search tasks. DA is easily implemented on GPU to utilize the latest computing technology, and can easily extended to an online dictionary learning scheme. We show by experiments that our optimized dictionaries substantially reduce the overall quantization error. Jointly used with residual vector quantization, our optimized dictionaries lead to a better approximate nearest neighbor search performance compared to the state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط

Derived Codebooks for High-Accuracy Nearest Neighbor Search

58 - Fabien Andre , Anne-Marie Kermarrec , Nicolas Le Scouarnec 2019

High-dimensional Nearest Neighbor (NN) search is central in multimedia search systems. Product Quantization (PQ) is a widespread NN search technique which has a high performance and good scalability. PQ compresses high-dimensional vectors into compac t codes thanks to a combination of quantizers. Large databases can, therefore, be stored entirely in RAM, enabling fast responses to NN queries. In almost all cases, PQ uses 8-bit quantizers as they offer low response times. In this paper, we advocate the use of 16-bit quantizers. Compared to 8-bit quantizers, 16-bit quantizers boost accuracy but they increase response time by a factor of 3 to 10. We propose a novel approach that allows 16-bit quantizers to offer the same response time as 8-bit quantizers, while still providing a boost of accuracy. Our approach builds on two key ideas: (i) the construction of derived codebooks that allow a fast and approximate distance evaluation, and (ii) a two-pass NN search procedure which builds a candidate set using the derived codebooks, and then refines it using 16-bit quantizers. On 1 billion SIFT vectors, with an inverted index, our approach offers a Recall@100 of 0.85 in 5.2 ms. By contrast, 16-bit quantizers alone offer a Recall@100 of 0.85 in 39 ms, and 8-bit quantizers a Recall@100 of 0.82 in 3.8 ms.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي قواعد البيانات

Compression Network with Transformer for Approximate Nearest Neighbor Search

426 - Haokui Zhang , Wenze Hu , Buzhou Tang 2021

We propose a generic feature compression method for Approximate Nearest Neighbor Search (ANNS) problems, which speeds up existing ANNS methods in a plug-and-play manner. Specifically, we propose a new network structure called Compression Network with Transformer (CNT) to compress the feature into a low dimensional space, and an inhomogeneous neighborhood relationship preserving (INRP) loss that aims to maintain high search accuracy. In CNT, we use multiple compression projections to cast the feature into many low dimensional spaces, and then use transformer to globally optimize these projections such that the features are well compressed following the guidance from our loss function. The loss function is designed to assign high weights on point pairs that are close in original feature space, and keep their distances in projected space. Keeping these distances helps maintain the eventual top-k retrieval accuracy, and down weighting others creates room for feature compression. In experiments, we run our compression method on public datasets, and use the compressed features in graph based, product quantization and scalar quantization based ANNS solutions. Experimental results show that our compression method can significantly improve the efficiency of these methods while preserves or even improves search accuracy, suggesting its broad potential impact on real world applications.

استرجاع المعلومات

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

76 - Elias Jaasaari , Ville Hyvonen , Teemu Roos 2018

Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy--speed trade -off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الوادي الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

HCLAE: High Capacity Locally Aggregating Encodings for Approximate Nearest Neighbor Search

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً