ترغب بنشر مسار تعليمي؟ اضغط هنا

Maximum Inner-Product Search using Tree Data-structures

207   0   0.0 ( 0 )
 نشر من قبل Parikshit Ram
 تاريخ النشر 2012
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The problem of {em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding the best match with respect to the inner product has never been explored in the general setting to the best of our knowledge. In this paper we consider this general problem and contrast it with the existing best-match algorithms. First, we propose a general branch-and-bound algorithm using a tree data structure. Subsequently, we present a dual-tree algorithm for the case where there are multiple queries. Finally we present a new data structure for increasing the efficiency of the dual-tree algorithm. These branch-and-bound algorithms involve novel bounds suited for the purpose of best-matching with inner products. We evaluate our proposed algorithms on a variety of data sets from various applications, and exhibit up to five orders of magnitude improvement in query time over the naive search technique.



قيم البحث

اقرأ أيضاً

There has been substantial research on sub-linear time approximate algorithms for Maximum Inner Product Search (MIPS). To achieve fast query time, state-of-the-art techniques require significant preprocessing, which can be a burden when the number of subsequent queries is not sufficiently large to amortize the cost. Furthermore, existing methods do not have the ability to directly control the suboptimality of their approximate results with theoretical guarantees. In this paper, we propose the first approximate algorithm for MIPS that does not require any preprocessing, and allows users to control and bound the suboptimality of the results. We cast MIPS as a Best Arm Identification problem, and introduce a new bandit setting that can fully exploit the special structure of MIPS. Our approach outperforms state-of-the-art methods on both synthetic and real-world datasets.
Given two sets of vectors, $A = {{a_1}, dots, {a_m}}$ and $B={{b_1},dots,{b_n}}$, our problem is to find the top-$t$ dot products, i.e., the largest $|{a_i}cdot{b_j}|$ among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach that avoids direct computation of all $mn$ dot products. We select diamonds (i.e., four-cycles) from the weighted tripartite representation of $A$ and $B$. The probability of selecting a diamond corresponding to pair $(i,j)$ is proportional to $({a_i}cdot{b_j})^2$, amplifying the focus on the largest-magnitude entries. Experimental results indicate that diamond sampling is orders of magnitude faster than direct computation and requires far fewer samples than any competing approach. We also apply diamond sampling to the special case of maximum inner product search, and get significantly better results than the state-of-the-art hashing methods.
We study the question whether a crossing-free 3D morph between two straight-line drawings of an $n$-vertex tree can be constructed consisting of a small number of linear morphing steps. We look both at the case in which the two given drawings are two -dimensional and at the one in which they are three-dimensional. In the former setting we prove that a crossing-free 3D morph always exists with $O(log n)$ steps, while for the latter $Theta(n)$ steps are always sufficient and sometimes necessary.
Solomon and Elkin constructed a shortcutting scheme for weighted trees which results in a 1-spanner for the tree metric induced by the input tree. The spanner has logarithmic lightness, logarithmic diameter, a linear number of edges and bounded degre e (provided the input tree has bounded degree). This spanner has been applied in a series of papers devoted to designing bounded degree, low-diameter, low-weight $(1+epsilon)$-spanners in Euclidean and doubling metrics. In this paper, we present a simple local routing algorithm for this tree metric spanner. The algorithm has a routing ratio of 1, is guaranteed to terminate after $O(log n)$ hops and requires $O(Delta log n)$ bits of storage per vertex where $Delta$ is the maximum degree of the tree on which the spanner is constructed. This local routing algorithm can be adapted to a local routing algorithm for a doubling metric spanner which makes use of the shortcutting scheme.
We give a polynomial-time constant-factor approximation algorithm for maximum independent set for (axis-aligned) rectangles in the plane. Using a polynomial-time algorithm, the best approximation factor previously known is $O(loglog n)$. The results are based on a new form of recursive partitioning in the plane, in which faces that are constant-complexity and orthogonally convex are recursively partitioned into a constant number of such faces.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا