Adaptive Exact Learning in a Mixed-Up World: Dealing with Periodicity, Errors and Jumbled-Index Queries in String Reconstruction


Abstract in English

We study the query complexity of exactly reconstructing a string from adaptive queries, such as substring, subsequence, and jumbled-index queries. Such problems have applications, e.g., in computational biology. We provide a number of new and improved bounds for exact string reconstruction for settings where either the string or the queries are mixed-up. For example, we show that a periodic (i.e., mixed-up) string, $S=p^kp$, of smallest period $p$, where $|p|<|p|$, can be reconstructed using $O(sigma|p|+lg n)$ substring queries, where $sigma$ is the alphabet size, if $n=|S|$ is unknown. We also show that we can reconstruct $S$ after having been corrupted by a small number of errors $d$, measured by Hamming distance. In this case, we give an algorithm that uses $O(dsigma|p| + d|p|lg frac{n}{d+1})$ queries. In addition, we show that a periodic string can be reconstructed using $2sigmalceillg nrceil + 2|p|lceillg sigmarceil$ subsequence queries, and that general strings can be reconstructed using $2sigmalceillg nrceil + nlceillg sigmarceil$ subsequence queries, without knowledge of $n$ in advance. This latter result improves the previous best, decades-old result, by Skiena and Sundaram. Finally, we believe we are the first to study the exact-learning query complexity for string reconstruction using jumbled-index queries, which are a mixed-up typeA of query that have received much attention of late.

Download