The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time


الملخص بالإنكليزية

We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$sigma$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $tilde O(k)$ space and $tilde Obig(sqrt kbig)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $tilde O(nsqrt k)$, and the fastest known offline algorithm, which costs $tilde Obig(n + minbig(frac{nk}{sqrt m},sigma nbig)big)$ time. Moreover, it is not known whether improvements over the $tilde O(nsqrt k)$ total time are possible when using more than $O(k)$ space. We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $kle s le m$, uses $tilde O(s)$ space and costs $tilde Obig(n+minbig(frac {nk^2}m,frac{nk}{sqrt s},frac{sigma nm}sbig)big)$ total time. For $s=m$, the total runtime becomes $tilde Obig(n + minbig(frac{nk}{sqrt m},sigma nbig)big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $tilde Obig(sqrt kbig)$.

تحميل البحث