ترغب بنشر مسار تعليمي؟ اضغط هنا

Adaptive Exact Learning in a Mixed-Up World: Dealing with Periodicity, Errors and Jumbled-Index Queries in String Reconstruction

55   0   0.0 ( 0 )
 نشر من قبل Pedro Matias
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study the query complexity of exactly reconstructing a string from adaptive queries, such as substring, subsequence, and jumbled-index queries. Such problems have applications, e.g., in computational biology. We provide a number of new and improved bounds for exact string reconstruction for settings where either the string or the queries are mixed-up. For example, we show that a periodic (i.e., mixed-up) string, $S=p^kp$, of smallest period $p$, where $|p|<|p|$, can be reconstructed using $O(sigma|p|+lg n)$ substring queries, where $sigma$ is the alphabet size, if $n=|S|$ is unknown. We also show that we can reconstruct $S$ after having been corrupted by a small number of errors $d$, measured by Hamming distance. In this case, we give an algorithm that uses $O(dsigma|p| + d|p|lg frac{n}{d+1})$ queries. In addition, we show that a periodic string can be reconstructed using $2sigmalceillg nrceil + 2|p|lceillg sigmarceil$ subsequence queries, and that general strings can be reconstructed using $2sigmalceillg nrceil + nlceillg sigmarceil$ subsequence queries, without knowledge of $n$ in advance. This latter result improves the previous best, decades-old result, by Skiena and Sundaram. Finally, we believe we are the first to study the exact-learning query complexity for string reconstruction using jumbled-index queries, which are a mixed-up typeA of query that have received much attention of late.



قيم البحث

اقرأ أيضاً

Motivated by applications in machine learning, such as subset selection and data summarization, we consider the problem of maximizing a monotone submodular function subject to mixed packing and covering constraints. We present a tight approximation a lgorithm that for any constant $epsilon >0$ achieves a guarantee of $1-frac{1}{mathrm{e}}-epsilon$ while violating only the covering constraints by a multiplicative factor of $1-epsilon$. Our algorithm is based on a novel enumeration method, which unlike previous known enumeration techniques, can handle both packing and covering constraints. We extend the above main result by additionally handling a matroid independence constraints as well as finding (approximate) pareto set optimal solutions when multiple submodular objectives are present. Finally, we propose a novel and purely combinatorial dynamic programming approach that can be applied to several special cases of the problem yielding not only {em deterministic} but also considerably faster algorithms. For example, for the well studied special case of only packing constraints (Kulik {em et. al.} [Math. Oper. Res. `13] and Chekuri {em et. al.} [FOCS `10]), we are able to present the first deterministic non-trivial approximation algorithm. We believe our new combinatorial approach might be of independent interest.
We study the problem of finding a spanning forest in an undirected, $n$-vertex multi-graph under two basic query models. One is the Linear query model which are linear measurements on the incidence vector induced by the edges; the other is the weaker OR query model which only reveals whether a given subset of plausible edges is empty or not. At the heart of our study lies a fundamental problem which we call the {em single element recovery} problem: given a non-negative real vector $x$ in $N$ dimension, return a single element $x_j > 0$ from the support. Queries can be made in rounds, and our goals is to understand the trade-offs between the query complexity and the rounds of adaptivity needed to solve these problems, for both deterministic and randomized algorithms. These questions have connections and ramifications to multiple areas such as sketching, streaming, graph reconstruction, and compressed sensing. Our main results are: * For the single element recovery problem, it is easy to obtain a deterministic, $r$-round algorithm which makes $(N^{1/r}-1)$-queries per-round. We prove that this is tight: any $r$-round deterministic algorithm must make $geq (N^{1/r} - 1)$ linear queries in some round. In contrast, a $1$-round $O(log^2 N)$-query randomized algorithm which succeeds 99% of the time is known to exist. * We design a deterministic $O(r)$-round, $tilde{O}(n^{1+1/r})$-OR query algorithm for graph connectivity. We complement this with an $tilde{Omega}(n^{1 + 1/r})$-lower bound for any $r$-round deterministic algorithm in the OR-model. * We design a randomized, $2$-round algorithm for the graph connectivity problem which makes $tilde{O}(n)$-OR queries. In contrast, we prove that any $1$-round algorithm (possibly randomized) requires $tilde{Omega}(n^2)$-OR queries.
We study the problem of learning communities in the presence of modeling errors and give robust recovery algorithms for the Stochastic Block Model (SBM). This model, which is also known as the Planted Partition Model, is widely used for community det ection and graph partitioning in various fields, including machine learning, statistics, and social sciences. Many algorithms exist for learning communities in the Stochastic Block Model, but they do not work well in the presence of errors. In this paper, we initiate the study of robust algorithms for partial recovery in SBM with modeling errors or noise. We consider graphs generated according to the Stochastic Block Model and then modified by an adversary. We allow two types of adversarial errors, Feige---Kilian or monotone errors, and edge outlier errors. Mossel, Neeman and Sly (STOC 2015) posed an open question about whether an almost exact recovery is possible when the adversary is allowed to add $o(n)$ edges. Our work answers this question affirmatively even in the case of $k>2$ communities. We then show that our algorithms work not only when the instances come from SBM, but also work when the instances come from any distribution of graphs that is $epsilon m$ close to SBM in the Kullback---Leibler divergence. This result also works in the presence of adversarial errors. Finally, we present almost tight lower bounds for two communities.
We consider an emph{approximate} version of the trace reconstruction problem, where the goal is to recover an unknown string $sin{0,1}^n$ from $m$ traces (each trace is generated independently by passing $s$ through a probabilistic insertion-deletion channel with rate $p$). We present a deterministic near-linear time algorithm for the average-case model, where $s$ is random, that uses only emph{three} traces. It runs in near-linear time $tilde O(n)$ and with high probability reports a string within edit distance $O(epsilon p n)$ from $s$ for $epsilon=tilde O(p)$, which significantly improves over the straightforward bound of $O(pn)$. Technically, our algorithm computes a $(1+epsilon)$-approximate median of the three input traces. To prove its correctness, our probabilistic analysis shows that an approximate median is indeed close to the unknown $s$. To achieve a near-linear time bound, we have to bypass the well-known dynamic programming algorithm that computes an optimal median in time $O(n^3)$.
The $r$-th iterated line graph $L^{r}(G)$ of a graph $G$ is defined by: (i) $L^{0}(G) = G$ and (ii) $L^{r}(G) = L(L^{(r- 1)}(G))$ for $r > 0$, where $L(G)$ denotes the line graph of $G$. The Hamiltonian Index $h(G)$ of $G$ is the smallest $r$ such th at $L^{r}(G)$ has a Hamiltonian cycle. Checking if $h(G) = k$ is NP-hard for any fixed integer $k geq 0$ even for subcubic graphs $G$. We study the parameterized complexity of this problem with the parameter treewidth, $tw(G)$, and show that we can find $h(G)$ in time $O*((1 + 2^{(omega + 3)})^{tw(G)})$ where $omega$ is the matrix multiplication exponent and the $O*$ notation hides polynomial factors in input size. The NP-hard Eulerian Steiner Subgraph problem takes as input a graph $G$ and a specified subset $K$ of terminal vertices of $G$ and asks if $G$ has an Eulerian (that is: connected, and with all vertices of even degree.) subgraph $H$ containing all the terminals. A second result (and a key ingredient of our algorithm for finding $h(G)$) in this work is an algorithm which solves Eulerian Steiner Subgraph in $O*((1 + 2^{(omega + 3)})^{tw(G)})$ time.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا