Approximate Trace Reconstruction via Median String (in Average-Case)


Abstract in English

We consider an emph{approximate} version of the trace reconstruction problem, where the goal is to recover an unknown string $sin{0,1}^n$ from $m$ traces (each trace is generated independently by passing $s$ through a probabilistic insertion-deletion channel with rate $p$). We present a deterministic near-linear time algorithm for the average-case model, where $s$ is random, that uses only emph{three} traces. It runs in near-linear time $tilde O(n)$ and with high probability reports a string within edit distance $O(epsilon p n)$ from $s$ for $epsilon=tilde O(p)$, which significantly improves over the straightforward bound of $O(pn)$. Technically, our algorithm computes a $(1+epsilon)$-approximate median of the three input traces. To prove its correctness, our probabilistic analysis shows that an approximate median is indeed close to the unknown $s$. To achieve a near-linear time bound, we have to bypass the well-known dynamic programming algorithm that computes an optimal median in time $O(n^3)$.

Download