ترغب بنشر مسار تعليمي؟ اضغط هنا

Given two sets of vectors, $A = {{a_1}, dots, {a_m}}$ and $B={{b_1},dots,{b_n}}$, our problem is to find the top-$t$ dot products, i.e., the largest $|{a_i}cdot{b_j}|$ among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach that avoids direct computation of all $mn$ dot products. We select diamonds (i.e., four-cycles) from the weighted tripartite representation of $A$ and $B$. The probability of selecting a diamond corresponding to pair $(i,j)$ is proportional to $({a_i}cdot{b_j})^2$, amplifying the focus on the largest-magnitude entries. Experimental results indicate that diamond sampling is orders of magnitude faster than direct computation and requires far fewer samples than any competing approach. We also apply diamond sampling to the special case of maximum inner product search, and get significantly better results than the state-of-the-art hashing methods.
We present a characterization of short-term stability of random Boolean networks under emph{arbitrary} distributions of transfer functions. Given any distribution of transfer functions for a random Boolean network, we present a formula that decides w hether short-term chaos (damage spreading) will happen. We provide a formal proof for this formula, and empirically show that its predictions are accurate. Previous work only works for special cases of balanced families. It has been observed that these characterizations fail for unbalanced families, yet such families are widespread in real biological networks.
Approximating the length of the longest increasing sequence (LIS) of an array is a well-studied problem. We study this problem in the data stream model, where the algorithm is allowed to make a single left-to-right pass through the array and the key resource to be minimized is the amount of additional memory used. We present an algorithm which, for any $delta > 0$, given streaming access to an array of length $n$ provides a $(1+delta)$-multiplicative approximation to the emph{distance to monotonicity} ($n$ minus the length of the LIS), and uses only $O((log^2 n)/delta)$ space. The previous best known approximation using polylogarithmic space was a multiplicative 2-factor. Our algorithm can be used to estimate the length of the LIS to within an additive $delta n$ for any $delta >0$ while previous algorithms could only achieve additive error $n(1/2-o(1))$. Our algorithm is very simple, being just 3 lines of pseudocode, and has a small update time. It is essentially a polylogarithmic space approximate implementation of a classic dynamic program that computes the LIS. We also give a streaming algorithm for approximating $LCS(x,y)$, the length of the longest common subsequence between strings $x$ and $y$, each of length $n$. Our algorithm works in the asymmetric setting (inspired by cite{AKO10}), in which we have random access to $y$ and streaming access to $x$, and runs in small space provided that no single symbol appears very often in $y$. More precisely, it gives an additive-$delta n$ approximation to $LCS(x,y)$ (and hence also to $E(x,y) = n-LCS(x,y)$, the edit distance between $x$ and $y$ when insertions and deletions, but not substitutions, are allowed), with space complexity $O(k(log^2 n)/delta)$, where $k$ is the maximum number of times any one symbol appears in $y$.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا