بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search

456 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل C. Seshadhri

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Grey Ballard - Ali Pinar - Tamara G. Kolda

الشبكات الاجتماعية والمعلومات بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Given two sets of vectors, $A = {{a_1}, dots, {a_m}}$ and $B={{b_1},dots,{b_n}}$, our problem is to find the top-$t$ dot products, i.e., the largest $|{a_i}cdot{b_j}|$ among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach that avoids direct computation of all $mn$ dot products. We select diamonds (i.e., four-cycles) from the weighted tripartite representation of $A$ and $B$. The probability of selecting a diamond corresponding to pair $(i,j)$ is proportional to $({a_i}cdot{b_j})^2$, amplifying the focus on the largest-magnitude entries. Experimental results indicate that diamond sampling is orders of magnitude faster than direct computation and requires far fewer samples than any competing approach. We also apply diamond sampling to the special case of maximum inner product search, and get significantly better results than the state-of-the-art hashing methods.

قيم البحث

526 - Parikshit Ram , Alexander G. Gray 2012

The problem of {em efficiently} finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied in literature. However, a closely related problem of efficiently finding th e best match with respect to the inner product has never been explored in the general setting to the best of our knowledge. In this paper we consider this general problem and contrast it with the existing best-match algorithms. First, we propose a general branch-and-bound algorithm using a tree data structure. Subsequently, we present a dual-tree algorithm for the case where there are multiple queries. Finally we present a new data structure for increasing the efficiency of the dual-tree algorithm. These branch-and-bound algorithms involve novel bounds suited for the purpose of best-matching with inner products. We evaluate our proposed algorithms on a variety of data sets from various applications, and exhibit up to five orders of magnitude improvement in query time over the naive search technique.

الهندسة الحسابية بنى وهياكل البيانات والخوارزميات استرجاع المعلومات

Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs

612 - C. Seshadhri , Ali Pinar , Tamara G. Kolda 2013

Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorit hms to compute them can be extremely expensive, even for moderately-sized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimation local clustering coefficients, degree-wise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than state-of-the-art alternatives.

الشبكات الاجتماعية والمعلومات بنى وهياكل البيانات والخوارزميات

A Bandit Approach to Maximum Inner Product Search

462 - Rui Liu , Tianyi Wu , Barzan Mozafari 2018

There has been substantial research on sub-linear time approximate algorithms for Maximum Inner Product Search (MIPS). To achieve fast query time, state-of-the-art techniques require significant preprocessing, which can be a burden when the number of subsequent queries is not sufficiently large to amortize the cost. Furthermore, existing methods do not have the ability to directly control the suboptimality of their approximate results with theoretical guarantees. In this paper, we propose the first approximate algorithm for MIPS that does not require any preprocessing, and allows users to control and bound the suboptimality of the results. We cast MIPS as a Best Arm Identification problem, and introduce a new bandit setting that can fully exploit the special structure of MIPS. Our approach outperforms state-of-the-art methods on both synthetic and real-world datasets.

التعلم الآلي التعلم الالي

Approximate nearest neighbors search without false negatives for $l_2$ for $c>sqrt{loglog{n}}$

123 - Piotr Sankowski , Piotr Wygocki 2017

In this paper, we report progress on answering the open problem presented by Pagh~[14], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the $c$-approximate nearest n eighbors problem without false negatives for Euclidean high dimensional space $mathcal{R}^d$. These data structures work for any $c = omega(sqrt{log{log{n}}})$, where $n$ is the number of points in the input set, with poly-logarithmic query time and polynomial preprocessing time. This improves over the known algorithms, which require $c$ to be $Omega(sqrt{d})$. This improvement is obtained by applying a sequence of reductions, which are interesting on their own. First, we reduce the problem to $d$ instances of dimension logarithmic in $n$. Next, these instances are reduced to a number of $c$-approximate nearest neighbor search instances in $big(mathbb{R}^kbig)^L$ space equipped with metric $m(x,y) = max_{1 le i le L}(lVert x_i - y_irVert_2)$.

الهندسة الحسابية بنى وهياكل البيانات والخوارزميات

Efficient sampling and counting algorithms for the Potts model on $mathbb Z^d$ at all temperatures

112 - Christian Borgs , Jennifer Chayes , Tyler Helmuth 2019

For $d ge 2$ and all $qgeq q_{0}(d)$ we give an efficient algorithm to approximately sample from the $q$-state ferromagnetic Potts and random cluster models on the torus $(mathbb Z / n mathbb Z )^d$ for any inverse temperature $betageq 0$. This stand s in contrast to Markov chain mixing time results: the Glauber dynamics mix slowly at and below the critical temperature, and the Swendsen--Wang dynamics mix slowly at the critical temperature. We also provide an efficient algorithm (an FPRAS) for approximating the partition functions of these models. Our algorithms are based on representing the random cluster model as a contour model using Pirogov-Sinai theory, and then computing an accurate approximation of the logarithm of the partition function by inductively truncating the resulting cluster expansion. The main innovation of our approach is an algorithmic treatment of unstable ground states; this is essential for our algorithms to apply to all inverse temperatures $beta$. By treating unstable ground states our work gives a general template for converting probabilistic applications of Pirogov--Sinai theory to efficient algorithms.

الاحتمالات بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الأكاديمية العربية للعلوم والتكنولوجيا والنقل البحري

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً