Sequence Annotation with HMMs: New Problems and Their Complexity

58 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Michal N\\'an\\'asi

تاريخ النشر 2012

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Michal Nanasi - Tomav{s} Vinav{r} - Brov{n}a Brejova

بنى وهياكل البيانات والخوارزميات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Hidden Markov models (HMMs) and their variants were successfully used for several sequence annotation tasks. Traditionally, inference with HMMs is done using the Viterbi and posterior decoding algorithms. However, recently a variety of different optimization criteria and associated computational problems were proposed. In this paper, we consider three HMM decoding criteria and prove their NP hardness. These criteria consider the set of states used to generate a certain sequence, but abstract from the exact locations of regions emitted by individual states. We also illustrate experimentally that these criteria are useful for HIV recombination detection.

قيم البحث

272 - Riccardo Dondi , Florian Sikora 2016

The quest for colorful components (connected components where each color is associated with at most one vertex) inside a vertex-colored graph has been widely considered in the last ten years. Here we consider two variants, Minimum Colorful Components (MCC) and Maximum Edges in transitive Closure (MEC), introduced in 2011 in the context of orthology gene identification in bioinformatics. The input of both MCC and MEC is a vertex-colored graph. MCC asks for the removal of a subset of edges, so that the resulting graph is partitioned in the minimum number of colorful connected components; MEC asks for the removal of a subset of edges, so that the resulting graph is partitioned in colorful connected components and the number of edges in the transitive closure of such a graph is maximized. We study the parameterized and approximation complexity of MCC and MEC, for general and restricted instances. For MCC on trees we show that the problem is basically equivalent to Minimum Cut on Trees, thus MCC is not approximable within factor $1.36 - varepsilon$, it is fixed-parameter tractable and it admits a poly-kernel (when the parameter is the number of colorful components). Moreover, we show that MCC, while it is polynomial time solvable on paths, it is NP-hard even for graphs with constant distance to disjoint paths number. Then we consider the parameterized complexity of MEC when parameterized by the number $k$ of edges in the transitive closure of a solution (the graph obtained by removing edges so that it is partitioned in colorful connected components). We give a fixed-parameter algorithm for MEC paramterized by $k$ and, when the input graph is a tree, we give a poly-kernel.

بنى وهياكل البيانات والخوارزميات

New routing techniques and their applications

122 - Liam Roditty , Roei Tov 2014

Let $G=(V,E)$ be an undirected graph with $n$ vertices and $m$ edges. We obtain the following new routing schemes: - A routing scheme for unweighted graphs that uses $tilde O(frac{1}{epsilon} n^{2/3})$ space at each vertex and $tilde O(1/epsilon)$- bit headers, to route a message between any pair of vertices $u,vin V$ on a $(2 + epsilon,1)$-stretch path, i.e., a path of length at most $(2+epsilon)cdot d(u,v)+1$. This should be compared to the $(2,1)$-stretch and $tilde O(n^{5/3})$ space distance oracle of Patrascu and Roditty [FOCS10 and SIAM J. Comput. 2014] and to the $(2,1)$-stretch routing scheme of Abraham and Gavoille [DISC11] that uses $tilde O( n^{3/4})$ space at each vertex. - A routing scheme for weighted graphs with normalized diameter $D$, that uses $tilde O(frac{1}{epsilon} n^{1/3}log D)$ space at each vertex and $tilde O(frac{1}{epsilon}log D)$-bit headers, to route a message between any pair of vertices on a $(5+epsilon)$-stretch path. This should be compared to the $5$-stretch and $tilde O(n^{4/3})$ space distance oracle of Thorup and Zwick [STOC01 and J. ACM. 2005] and to the $7$-stretch routing scheme of Thorup and Zwick [SPAA01] that uses $tilde O( n^{1/3})$ space at each vertex. Since a $5$-stretch routing scheme must use tables of $Omega( n^{1/3})$ space our result is almost tight. - For an integer $ell>1$, a routing scheme for unweighted graphs that uses $tilde O(ellfrac{1}{epsilon} n^{ell/(2ell pm 1)})$ space at each vertex and $tilde O(frac{1}{epsilon})$-bit headers, to route a message between any pair of vertices on a $(3pm2/ell+epsilon,2)$-stretch path. - A routing scheme for weighted graphs, that uses $tilde O(frac{1}{epsilon}n^{1/k}log D)$ space at each vertex and $tilde O(frac{1}{epsilon}log D)$-bit headers, to route a message between any pair of vertices on a $(4k-7+epsilon)$-stretch path.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

The Highest Expected Reward Decoding for HMMs with Application to Recombination Detection

52 - Michal Nanasi , Tomav{s} Vinav{r} , Brov{n}a Brejova 2010

Hidden Markov models are traditionally decoded by the Viterbi algorithm which finds the highest probability state path in the model. In recent years, several limitations of the Viterbi decoding have been demonstrated, and new algorithms have been dev eloped to address them citep{Kall2005,Brejova2007,Gross2007,Brown2010}. In this paper, we propose a new efficient highest expected reward decoding algorithm (HERD) that allows for uncertainty in boundaries of individual sequence features. We demonstrate usefulness of our approach on jumping HMMs for recombination detection in viral genomes.

بنى وهياكل البيانات والخوارزميات الجينوم

New Unconditional Hardness Results for Dynamic and Online Problems

126 - Raphael Clifford , Allan Gr{o}nlund , Kasper Green Larsen 2015

There has been a resurgence of interest in lower bounds whose truth rests on the conjectured hardness of well known computational problems. These conditional lower bounds have become important and popular due to the painfully slow progress on proving strong unconditional lower bounds. Nevertheless, the long term goal is to replace these conditional bounds with unconditional ones. In this paper we make progress in this direction by studying the cell probe complexity of two conjectured to be hard problems of particular importance: matrix-vector multiplication and a version of dynamic set disjointness known as Patrascus Multiphase Problem. We give improved unconditional lower bounds for these problems as well as introducing new proof techniques of independent interest. These include a technique capable of proving strong threshold lower bounds of the following form: If we insist on having a very fast query time, then the update time has to be slow enough to compute a lookup table with the answer to every possible query. This is the first time a lower bound of this type has been proven.

بنى وهياكل البيانات والخوارزميات التعقيد الحسابي نظرية المعلومات

Covering with Clubs: Complexity and Approximability

63 - Riccardo Dondi , Giancarlo Mauri , Florian Sikora 2018

Finding cohesive subgraphs in a network is a well-known problem in graph theory. Several alternative formulations of cohesive subgraph have been proposed, a notable example being $s$-club, which is a subgraph where each vertex is at distance at most $s$ to the others. Here we consider the problem of covering a given graph with the minimum number of $s$-clubs. We study the computational and approximation complexity of this problem, when $s$ is equal to 2 or 3. First, we show that deciding if there exists a cover of a graph with three $2$-clubs is NP-complete, and that deciding if there exists a cover of a graph with two $3$-clubs is NP-complete. Then, we consider the approximation complexity of covering a graph with the minimum number of $2$-clubs and $3$-clubs. We show that, given a graph $G=(V,E)$ to be covered, covering $G$ with the minimum number of $2$-clubs is not approximable within factor $O(|V|^{1/2 -varepsilon})$, for any $varepsilon>0$, and covering $G$ with the minimum number of $3$-clubs is not approximable within factor $O(|V|^{1 -varepsilon})$, for any $varepsilon>0$. On the positive side, we give an approximation algorithm of factor $2|V|^{1/2}log^{3/2} |V|$ for covering a graph with the minimum number of $2$-clubs.

بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات