No Arabic abstract
We show the expected order of RNA saturated secondary structures of size $n$ is $log_4n(1+O(frac{log_2n}{n}))$, if we select the saturated secondary structure uniformly at random. Furthermore, the order of saturated secondary structures is sharply concentrated around its mean. As a consequence saturated structures and structures in the traditional model behave the same with respect to the expected order. Thus we may conclude that the traditional model has already drawn the right picture and conclusions inferred from it with respect to the order (the overall shape) of a structure remain valid even if enforcing saturation (at least in expectation).
In this paper we analyze the length-spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length-spectrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least $n-O(n^{1/2})$ and that with high probability any other rainbow has finite length. We show that the distribution of the length of the longest rainbow converges to a discrete limit law and that, for finite $k$, the distribution of rainbows of length $k$, becomes for large $n$ a negative binomial distribution. We then put the results of this paper into context, comparing the analytical results with those observed in RNA minimum free energy structures, biological RNA structures and relate our findings to the sparsification of folding algorithms.
Given a random RNA secondary structure, $S$, we study RNA sequences having fixed ratios of nuclotides that are compatible with $S$. We perform this analysis for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of the nucleotide ratios there exists a convex region in which, in the limit of long sequences, a random structure a.a.s.~has compatible sequence with these ratios and outside of which a.a.s.~a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. In particular, for {bf GC}-sequences having a ratio of {bf G} nucleotides smaller than $1/3$, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s.~no such compatible sequence. For sequences having a ratio of {bf G} nucleotides larger than $1/3$, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.
The topological filtration of interacting RNA complexes is studied and the role is analyzed of certain diagrams called irreducible shadows, which form suitable building blocks for more general structures. We prove that for two interacting RNAs, called interaction structures, there exist for fixed genus only finitely many irreducible shadows. This implies that for fixed genus there are only finitely many classes of interaction structures. In particular the simplest case of genus zero already provides the formalism for certain types of structures that occur in nature and are not covered by other filtrations. This case of genus zero interaction structures is already of practical interest, is studied here in detail and found to be expressed by a multiple context-free grammar extending the usual one for RNA secondary structures. We show that in $O(n^6)$ time and $O(n^4)$ space complexity, this grammar for genus zero interaction structures provides not only minimum free energy solutions but also the complete partition function and base pairing probabilities.
An RNA sequence is a word over an alphabet on four elements ${A,C,G,U}$ called bases. RNA sequences fold into secondary structures where some bases match one another while others remain unpaired. Pseudoknot-free secondary structures can be represented as well-parenthesized expressions with additional dots, where pairs of matching parentheses symbolize paired bases and dots, unpaired bases. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some model of energy and to design sequences of bases which will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS 2016). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Anne Condon (ICALP 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this paper we show that, in the simplest model of energy which is the Watson-Crick model the design of secondary structures is NP-complete if one adds natural constraints of the form: index $i$ of the sequence has to be labeled by base $b$. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice.
A topological RNA structure is derived from a diagram and its shape is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one. Shapes contain key topological, information and for fixed topological genus there exist only finitely many such shapes. We shall express topological RNA structures as unicellular maps, i.e. graphs together with a cyclic ordering of their half-edges. In this paper we prove a bijection of shapes of topological RNA structures. We furthermore derive a linear time algorithm generating shapes of fixed topological genus. We derive explicit expressions for the coefficients of the generating polynomial of these shapes and the generating function of RNA structures of genus $g$. Furthermore we outline how shapes can be used in order to extract essential information of RNA structure databases.