Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Lower Bounds for Optimal Alignments of Binary Sequences

325 0 0.0 ( 0 )

Download Cite

Added by Cynthia Vinzant

Publication date 2011

fields Biology

and research's language is English

Authors Cynthia Vinzant

Genomics Quantitative Methods

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In parametric sequence alignment, optimal alignments of two sequences are computed as a function of the penalties for mismatches and spaces, producing many different optimal alignments. Here we give a 3/(2^{7/3}pi^{2/3})n^{2/3} +O(n^{1/3} log n) lower bound on the maximum number of distinct optimal alignment summaries of length-n binary sequences. This shows that the upper bound given by Gusfield et. al. is tight over all alphabets, thereby disproving the square root of n conjecture. Thus the maximum number of distinct optimal alignment summaries (i.e. vertices of the alignment polytope) over all pairs of length-n sequences is Theta(n^{2/3}).

rate research

Mean-field methods in evolutionary duplication-innovation-loss models for the genome-level repertoire of protein domains

591 - A. Angelini , A. Amato , G. Bianconi 2009

We present a combined mean-field and simulation approach to different models describing the dynamics of classes formed by elements that can appear, disappear or copy themselves. These models, related to a paradigm duplication-innovation model known as Chinese Restaurant Process, are devised to reproduce the scaling behavior observed in the genome-wide repertoire of protein domains of all known species. In view of these data, we discuss the qualitative and quantitative differences of the alternative model formulations, focusing in particular on the roles of element loss and of the specificity of empirical domain classes.

Genomics Quantitative Methods

Optimal lower bounds for multiple recurrence

73 - Sebastian Donoso , Anh N. Le , Joel Moreira 2018

Let $(X, mathcal{B},mu,T)$ be an ergodic measure preserving system, $A in mathcal{B}$ and $epsilon>0$. We study the largeness of sets of the form begin{equation*} begin{split} S = left{ ninmathbb{N}colonmu(Acap T^{-f_1(n)}Acap T^{-f_2(n)}Acapldotscap T^{-f_k(n)}A)> mu(A)^{k+1} - epsilon right} end{split} end{equation*} for various families ${f_1,dots,f_k}$ of sequences $f_icolon mathbb{N} to mathbb{N}$. For $k leq 3$ and $f_{i}(n)=i f(n)$, we show that $S$ has positive density if $f(n)=q(p_n)$ where $q in mathbb{Z}[x]$ satisfies $q(1)$ or $q(-1) =0$ and $p_n$ denotes the $n$-th prime; or when $f$ is a certain Hardy field sequence. If $T^q$ is ergodic for some $q in mathbb{N}$, then for all $r in mathbb{Z}$, $S$ is syndetic if $f(n) = qn + r$. For $f_{i}(n)=a_{i}n$, where $a_{i}$ are distinct integers, we show that $S$ can be empty for $kgeq 4$, and for $k = 3$ we found an interesting relation between the largeness of $S$ and the abundance of solutions to certain linear equations in sparse sets of integers. We also provide some partial results when the $f_{i}$ are distinct polynomials.

Dynamical Systems Combinatorics

Prediction of genomic properties and classification of life by protein length distributions

490 - Dirson Jian Li , Shengli Zhang 2008

Much evolutionary information is stored in the fluctuations of protein length distributions. The genome size and non-coding DNA content can be calculated based only on the protein length distributions. So there is intrinsic relationship between the coding DNA size and non-coding DNA size. According to the correlations and quasi-periodicity of protein length distributions, we can classify life into three domains. Strong evidences are found to support the order in the structures of protein length distributions.

Genomics Quantitative Methods

The C-value enigma and timing of the Cambrian explosion

925 - Dirson Jian Li , Shengli Zhang 2008

The Cambrian explosion is a grand challenge to science today and involves multidisciplinary study. This event is generally believed as a result of genetic innovations, environmental factors and ecological interactions, even though there are many conflicts on nature and timing of metazoan origins. The crux of the matter is that an entire roadmap of the evolution is missing to discern the biological complexity transition and to evaluate the critical role of the Cambrian explosion in the overall evolutionary context. Here we calculate the time of the Cambrian explosion by an innovative and accurate C-value clock; our result (560 million years ago) quite fits the fossil records. We clarify that the intrinsic reason of genome evolution determined the Cambrian explosion. A general formula for evaluating genome size of different species has been found, by which major questions of the C-value enigma can be solved and the genome size evolution can be illustrated. The Cambrian explosion is essentially a major transition of biological complexity, which corresponds to a turning point in genome size evolution. The observed maximum prokaryotic complexity is just a relic of the Cambrian explosion and it is supervised by the maximum information storage capability in the observed universe. Our results open a new prospect of studying metazoan origins and molecular evolution.

Genomics Quantitative Methods

Optimal Lower Bounds for Sketching Graph Cuts

122 - Charles Carlson , Alexandra Kolla , Nikhil Srivastava 2017

We study the space complexity of sketching cuts and Laplacian quadratic forms of graphs. We show that any data structure which approximately stores the sizes of all cuts in an undirected graph on $n$ vertices up to a $1+epsilon$ error must use $Omega(nlog n/epsilon^2)$ bits of space in the worst case, improving the $Omega(n/epsilon^2)$ bound of Andoni et al. and matching the best known upper bound achieved by spectral sparsifiers. Our proof is based on a rigidity phenomenon for cut (and spectral) approximation which may be of independent interest: any two $d-$regular graphs which approximate each others cuts significantly better than a random graph approximates the complete graph must overlap in a constant fraction of their edges.

Data Structures and Algorithms Discrete Mathematics

comments

Fetching comments

AlHawash Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Lower Bounds for Optimal Alignments of Binary Sequences

Ask ChatGPT about the research

No Arabic abstract

Read More