ﻻ يوجد ملخص باللغة العربية
Billion-scale high-dimensional approximate nearest neighbour (ANN) search has become an important problem for searching similar objects among the vast amount of images and videos available online. The existing ANN methods are usually characterized by their specific indexing structures, including the inverted index and the inverted multi-index structure. The inverted index structure is amenable to GPU-based implementations, and the state-of-the-art systems such as Faiss are able to exploit the massive parallelism offered by GPUs. However, the inverted index requires high memory overhead to index the dataset effectively. The inverted multi-index structure is difficult to implement for GPUs, and also ineffective in dealing with database with different data distributions. In this paper we propose a novel hierarchical inverted index structure generated by vector and line quantization methods. Our quantization method improves both search efficiency and accuracy, while maintaining comparable memory consumption. This is achieved by reducing search space and increasing the number of indexed regions. We introduce a new ANN search system, VLQ-ADC, that is based on the proposed inverted index, and perform extensive evaluation on two public billion-scale benchmark datasets SIFT1B and DEEP1B. Our evaluation shows that VLQ-ADC significantly outperforms the state-of-the-art GPU- and CPU-based systems in terms of both accuracy and search speed. The source code of VLQ-ADC is available at https://github.com/zjuchenwei/vector-line-quantization.
Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem
Recent advance of large scale similarity search involves using deeply learned representations to improve the search accuracy and use vector quantization methods to increase the search speed. However, how to learn deep representations that strongly pr
Quantization methods have been introduced to perform large scale approximate nearest search tasks. Residual Vector Quantization (RVQ) is one of the effective quantization methods. RVQ uses a multi-stage codebook learning scheme to lower the quantizat
Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization fr
Hierarchical matrices are space and time efficient representations of dense matrices that exploit the low rank structure of matrix blocks at different levels of granularity. The hierarchically low rank block partitioning produces representations that