بالنسبة لهذا البحث، نقدم الآلية فيتربي لفك تشفير النماذج المخفية الماركوفي (HMMs) بمساحة أصغر من الخطية. وتشير تحليلنا على HMMs من طورين إلى أن الذاكرة القصوى المتوقعة المستخدمة لفك سلسلة طولها $n$ مع HMM من طور $m$ يمكن أن تكون بحد أدنى من $Theta(mlog n)$، دون تبطئ كبير في مقارنة مع الآلية الفيتربي التقليدية. وتتطلب الآلية الفيتربي التقليدية $O(mn)$ من الذاكرة، والتي هي غير عملية لتحليل السلاسل الحمضية الطويلة (مثل كروموسومات الوحدة البشرية الكاملة) وللسياقات البيانات المستمرة. كما أننا نظهر أداء الآلية فيتربي الآلية على HMM بسيط لإيجاد الجينات على السلاسل الحمضية المصطبة والحقيقية.
In this paper, we introduce the on-line Viterbi algorithm for decoding hidden Markov models (HMMs) in much smaller than linear space. Our analysis on two-state HMMs suggests that the expected maximum memory used to decode sequence of length $n$ with $m$-state HMM can be as low as $Theta(mlog n)$, without a significant slow-down compared to the classical Viterbi algorithm. Classical Viterbi algorithm requires $O(mn)$ space, which is impractical for analysis of long DNA sequences (such as complete human genome chromosomes) and for continuous data streams. We also experimentally demonstrate the performance of the on-line Viterbi algorithm on a simple HMM for gene finding on both simulated and real DNA sequences.
For a graph $G$ on $n$ vertices, naively sampling the position of a random walk of at time $t$ requires work $Omega(t)$. We desire local access algorithms supporting $text{position}(G,s,t)$ queries, which return the position of a random walk from some start vertex $s$ at time $t$, where the joint distribution of returned positions is $1/text{poly}(n)$ close to the uniform distribution over such walks in $ell_1$ distance. We first give an algorithm for local access to walks on undirected regular graphs with $widetilde{O}(frac{1}{1-lambda}sqrt{n})$ runtime per query, where $lambda$ is the second-largest eigenvalue in absolute value. Since random $d$-regular graphs are expanders with high probability, this gives an $widetilde{O}(sqrt{n})$ algorithm for $G(n,d)$, which improves on the naive method for small numbers of queries. We then prove that no that algorithm with sub-constant error given probe access to random $d$-regular graphs can have runtime better than $Omega(sqrt{n}/log(n))$ per query in expectation, obtaining a nearly matching lower bound. We further show an $Omega(n^{1/4})$ runtime per query lower bound even with an oblivious adversary (i.e. when the query sequence is fixed in advance). We then show that for families of graphs with additional group theoretic structure, dramatically better results can be achieved. We give local access to walks on small-degree abelian Cayley graphs, including cycles and hypercubes, with runtime $text{polylog}(n)$ per query. This also allows for efficient local access to walks on $text{polylog}$ degree expanders. We extend our results to graphs constructed using the tensor product (giving local access to walks on degree $n^epsilon$ graphs for any $epsilon in (0,1]$) and Cartesian product.
The problem of how many trajectories of a random walker in a potential are needed to reconstruct the values of this potential is studied. We show that this problem can be solved by calculating the probability of survival of an abstract random walker in a partially absorbing potential. The approach is illustrated on the discrete Sinai (random force) model with a drift. We determine the parameter (temperature, duration of each trajectory, ...) values making reconstruction as fast as possible.
Under the Strong Exponential Time Hypothesis, an integer linear program with $n$ Boolean-valued variables and $m$ equations cannot be solved in $c^n$ time for any constant $c < 2$. If the domain of the variables is relaxed to $[0,1]$, the associated linear program can of course be solved in polynomial time. In this work, we give a natural algorithmic bridging between these extremes of $0$-$1$ and linear programming. Specifically, for any subset (finite union of intervals) $E subset [0,1]$ containing ${0,1}$, we give a random-walk based algorithm with runtime $O_E((2-text{measure}(E))^ntext{poly}(n,m))$ that finds a solution in $E^n$ to any $n$-variable linear program with $m$ constraints that is feasible over ${0,1}^n$. Note that as $E$ expands from ${0,1}$ to $[0,1]$, the runtime improves smoothly from $2^n$ to polynomial. Taking $E = [0,1/k) cup (1-1/k,1]$ in our result yields as a corollary a randomized $(2-2/k)^{n}text{poly}(n)$ time algorithm for $k$-SAT. While our approach has some high level resemblance to Sch{o}nings beautiful algorithm, our general algorithm is based on a more sophisticated random walk that incorporates several new ingredients, such as a multiplicative potential to measure progress, a judicious choice of starting distribution, and a time varying distribution for the evolution of the random walk that is itself computed via an LP at each step (a solution to which is guaranteed based on the minimax theorem). Plugging the LP algorithm into our earlier polymorphic framework yields fast exponential algorithms for any CSP (like $k$-SAT, $1$-in-$3$-SAT, NAE $k$-SAT) that admit so-called `threshold partial polymorphisms.
We give a new simple and short (one-line) analysis for the runtime of the well-known Euclidean Algorithm. While very short simple, the obtained upper bound in near-optimal.
In this paper we show a deterministic parallel all-pairs shortest paths algorithm for real-weighted directed graphs. The algorithm has $tilde{O}(nm+(n/d)^3)$ work and $tilde{O}(d)$ depth for any depth parameter $din [1,n]$. To the best of our knowledge, such a trade-off has only been previously described for the real-weighted single-source shortest paths problem using randomization [Bringmann et al., ICALP17]. Moreover, our result improves upon the parallelism of the state-of-the-art randomized parallel algorithm for computing transitive closure, which has $tilde{O}(nm+n^3/d^2)$ work and $tilde{O}(d)$ depth [Ullman and Yannakakis, SIAM J. Comput. 91]. Our APSP algorithm turns out to be a powerful tool for designing efficient planar graph algorithms in both parallel and sequential regimes. One notable ingredient of our parallel APSP algorithm is a simple deterministic $tilde{O}(nm)$-work $tilde{O}(d)$-depth procedure for computing $tilde{O}(n/d)$-size hitting sets of shortest $d$-hop paths between all pairs of vertices of a real-weighted digraph. Such hitting sets have also been called $d$-hub sets. Hub sets have previously proved especially useful in designing parallel or dynamic shortest paths algorithms and are typically obtained via random sampling. Our procedure implies, for example, an $tilde{O}(nm)$-time deterministic algorithm for finding a shortest negative cycle of a real-weighted digraph. Such a near-optimal bound for this problem has been so far only achieved using a randomized algorithm [Orlin et al., Discret. Appl. Math. 18].