أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل C. Seshadhri

Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search

115 - Grey Ballard , Ali Pinar , Tamara G. Kolda 2015

Given two sets of vectors, $A = {{a_1}, dots, {a_m}}$ and $B={{b_1},dots,{b_n}}$, our problem is to find the top-$t$ dot products, i.e., the largest $|{a_i}cdot{b_j}|$ among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach that avoids direct computation of all $mn$ dot products. We select diamonds (i.e., four-cycles) from the weighted tripartite representation of $A$ and $B$. The probability of selecting a diamond corresponding to pair $(i,j)$ is proportional to $({a_i}cdot{b_j})^2$, amplifying the focus on the largest-magnitude entries. Experimental results indicate that diamond sampling is orders of magnitude faster than direct computation and requires far fewer samples than any competing approach. We also apply diamond sampling to the special case of maximum inner product search, and get significantly better results than the state-of-the-art hashing methods.

الشبكات الاجتماعية والمعلومات بنى وهياكل البيانات والخوارزميات

Characterizing short-term stability for Boolean networks over any distribution of transfer functions

128 - C. Seshadhri , Andrew M. Smith , Yevgeniy Vorobeychik 2014

We present a characterization of short-term stability of random Boolean networks under emph{arbitrary} distributions of transfer functions. Given any distribution of transfer functions for a random Boolean network, we present a formula that decides w hether short-term chaos (damage spreading) will happen. We provide a formal proof for this formula, and empirically show that its predictions are accurate. Previous work only works for special cases of balanced families. It has been observed that these characterizations fail for unbalanced families, yet such families are widespread in real biological networks.

ديناميات الفوضوية الرياضيات المتقطعة السيارات الخلوية والغازات شعرية

Space efficient streaming algorithms for the distance to monotonicity and asymmetric edit distance

84 - Michael Saks , C. Seshadhri 2012

Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any $delta > 0$, given streaming access to an array of length $n$ provides a $(1+delta)$-multiplicative approximation to the emph{distance to monotonicity} ($n$ minus the length of the LIS), and uses only $O((log^2 n)/delta)$ space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. Our algorithm can be used to estimate the length of the LIS to within an additive $delta n$ for any $delta >0$ while previous algorithms could only achieve additive error $n(1/2-o(1))$. Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also give a streaming algorithm for approximating $LCS(x,y)$, the length of the longest common subsequence between strings $x$ and $y$, each of length $n$. Our algorithm works in the asymmetric setting (inspired by cite{AKO10}), in which we have random access to $y$ and streaming access to $x$, and runs in small space provided that no single symbol appears very often in $y$. More precisely, it gives an additive-$delta n$ approximation to $LCS(x,y)$ (and hence also to $E(x,y) = n-LCS(x,y)$, the edit distance between $x$ and $y$ when insertions and deletions, but not substitutions, are allowed), with space complexity $O(k(log^2 n)/delta)$, where $k$ is the maximum number of times any one symbol appears in $y$.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد