ﻻ يوجد ملخص باللغة العربية
Given two sets of vectors, $A = {{a_1}, dots, {a_m}}$ and $B={{b_1},dots,{b_n}}$, our problem is to find the top-$t$ dot products, i.e., the largest $|{a_i}cdot{b_j}|$ among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach that avoids direct computation of all $mn$ dot products. We select diamonds (i.e., four-cycles) from the weighted tripartite representation of $A$ and $B$. The probability of selecting a diamond corresponding to pair $(i,j)$ is proportional to $({a_i}cdot{b_j})^2$, amplifying the focus on the largest-magnitude entries. Experimental results indicate that diamond sampling is orders of magnitude faster than direct computation and requires far fewer samples than any competing approach. We also apply diamond sampling to the special case of maximum inner product search, and get significantly better results than the state-of-the-art hashing methods.
The problem of {em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding th
Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorit
There has been substantial research on sub-linear time approximate algorithms for Maximum Inner Product Search (MIPS). To achieve fast query time, state-of-the-art techniques require significant preprocessing, which can be a burden when the number of
In this paper, we report progress on answering the open problem presented by Pagh~[14], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the $c$-approximate nearest n
For $d ge 2$ and all $qgeq q_{0}(d)$ we give an efficient algorithm to approximately sample from the $q$-state ferromagnetic Potts and random cluster models on the torus $(mathbb Z / n mathbb Z )^d$ for any inverse temperature $betageq 0$. This stand