أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Thomas J. X. Li

On an enhancement of RNA probing data using Information Theory

115 - Thomas J. X. Li , Christian M. Reidys 2019

Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an info rmation-theoretic approach to solve the problem, via considering a variant of the R{e}nyi-Ulam game. Our framework is centered around the ensemble tree, a hierarchical bi-partition of the input ensemble, that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. We present that leaves of the tree are comprised of sub-samples exhibiting a distinguished structure with high probability. In particular, for a Boltzmann ensemble incorporating probing data, which is well established in the literature, the probability of our framework correctly identifying the target in the leaf is greater than $90%$.

الجزيئات الحيوية نظرية المعلومات التوافقية

The block spectrum of RNA pseudoknot structures

76 - Thomas J. X. Li , Christina S. Burris , Christian M. Reidys 2018

In this paper we analyze the length-spectrum of blocks in $gamma$-structures. $gamma$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time RNA folding. A $gamma$-structure is constructed by nesti ng and concatenating specific building components having topological genus at most $gamma$. A block is a substructure enclosed by crossing maximal arcs with respect to the partial order induced by nesting. We show that, in uniformly generated $gamma$-structures, there is a significant gap in this length-spectrum, i.e., there asymptotically almost surely exists a unique longest block of length at least $n-O(n^{1/2})$ and that with high probability any other block has finite length. For fixed $gamma$, we prove that the length of the longest block converges to a discrete limit law, and that the distribution of short blocks of given length tends to a negative binomial distribution in the limit of long sequences. We refine this analysis to the length spectrum of blocks of specific pseudoknot types, such as H-type and kissing hairpins. Our results generalize the rainbow spectrum on secondary structures by the first and third authors and are being put into context with the structural prediction of long non-coding RNAs.

التوافقية الاحتمالات الجزيئات الحيوية

The rainbow-spectrum of RNA secondary structures

259 - Thomas J. X. Li , Christian M. Reidys 2018

In this paper we analyze the length-spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length-spe ctrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least $n-O(n^{1/2})$ and that with high probability any other rainbow has finite length. We show that the distribution of the length of the longest rainbow converges to a discrete limit law and that, for finite $k$, the distribution of rainbows of length $k$, becomes for large $n$ a negative binomial distribution. We then put the results of this paper into context, comparing the analytical results with those observed in RNA minimum free energy structures, biological RNA structures and relate our findings to the sparsification of folding algorithms.

التوافقية الاحتمالات الجزيئات الحيوية

From unicellular fatgraphs to trees

77 - Thomas J. X. Li , Christian M. Reidys 2018

In this paper we study the minimum number of reversals needed to transform a unicellular fatgraph into a tree. We consider reversals acting on boundary components, having the natural interpretation as gluing, slicing or half-flipping of vertices. Our main result is an expression for the minimum number of reversals needed to transform a unicellular fatgraph to a plane tree. The expression involves the Euler genus of the fatgraph and an additional parameter, which counts the number of certain orientable blocks in the decomposition of the fatgraph. In the process we derive a constructive proof of how to decompose non-orientable, irreducible, unicellular fatgraphs into smaller fatgraphs of the same type or trivial fatgraphs, consisting of a single ribbon. We furthermore provide a detailed analysis how reversals affect the component-structure of the underlying fatgraphs. Our results generalize the Hannenhalli-Pevzner formula for the reversal distance of signed permutations.

التوافقية

Statistics of topological RNA structures

66 - Thomas J. X. Li , Christian M. Reidys 2016

In this paper we study properties of topological RNA structures, i.e.~RNA contact structures with cross-serial interactions that are filtered by their topological genus. RNA secondary structures within this framework are topological structures having genus zero. We derive a new bivariate generating function whose singular expansion allows us to analyze the distributions of arcs, stacks, hairpin- , interior- and multi-loops. We then extend this analysis to H-type pseudoknots, kissing hairpins as well as $3$-knots and compute their respective expectation values. Finally we discuss our results and put them into context with data obtained by uniform sampling structures of fixed genus.

التوافقية الجزيئات الحيوية الأساليب الكمية

RNA secondary structures having a compatible sequence of certain nucleotide ratios

109 - Christopher L. Barrett , Thomas J. X. Li , Christian M. Reidys 2016

Given a random RNA secondary structure, $S$, we study RNA sequences having fixed ratios of nuclotides that are compatible with $S$. We perform this analysis for RNA secondary structures subject to various base pairing rules and minimum arc- and stack -length restrictions. Our main result reads as follows: in the simplex of the nucleotide ratios there exists a convex region in which, in the limit of long sequences, a random structure a.a.s.~has compatible sequence with these ratios and outside of which a.a.s.~a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. In particular, for {bf GC}-sequences having a ratio of {bf G} nucleotides smaller than $1/3$, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s.~no such compatible sequence. For sequences having a ratio of {bf G} nucleotides larger than $1/3$, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.

التوافقية الجزيئات الحيوية الأساليب الكمية

A combinatorial interpretation of the $kappa^{star}_{g}(n)$ coefficients

97 - Thomas J. X. Li , Christian M. Reidys 2014

Studying the virtual Euler characteristic of the moduli space of curves, Harer and Zagier compute the generating function $C_g(z)$ of unicellular maps of genus $g$. They furthermore identify coefficients, $kappa^{star}_{g}(n)$, which fully determine the series $C_g(z)$. The main result of this paper is a combinatorial interpretation of $kappa^{star}_{g}(n)$. We show that these enumerate a class of unicellular maps, which correspond $1$-to-$2^{2g}$ to a specific type of trees, referred to as O-trees. O-trees are a variant of the C-decorated trees introduced by Chapuy, F{e}ray and Fusy. We exhaustively enumerate the number $s_{g}(n)$ of shapes of genus $g$ with $n$ edges, which is a specific class of unicellular maps with vertex degree at least three. Furthermore we give combinatorial proofs for expressing the generating functions $C_g(z)$ and $S_g(z)$ for unicellular maps and shapes in terms of $kappa^{star}_{g}(n)$, respectively. We then prove a two term recursion for $kappa^{star}_{g}(n)$ and that for any fixed $g$, the sequence ${kappa_{g,t}}_{t=0}^g$ is log-concave, where $kappa^{star}_{g}(n)= kappa_{g,t}$, for $n=2g+t-1$.

التوافقية

The topological filtration of $gamma$-structures

36 - Thomas J. X. Li , Christian M. Reidys 2012

In this paper we study $gamma$-structures filtered by topological genus. $gamma$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A $gamma$-structure is composed by specific building blocks, that have topological genus less than or equal to $gamma$, where composition means concatenation and nesting of such blocks. Our main results are the derivation of a new bivariate generating function for $gamma$-structures via symbolic methods, the singularity analysis of the solutions and a central limit theorem for the distribution of topological genus in $gamma$-structures of given length. In our derivation specific bivariate polynomials play a central role. Their coefficients count particular motifs of fixed topological genus and they are of relevance in the context of genus recursion and novel folding algorithms.

التوافقية الأساليب الكمية

Combinatorial analysis of interacting RNA molecules

82 - Thomas J. X. Li , Christian M. Reidys 2010

Recently several minimum free energy (MFE) folding algorithms for predicting the joint structure of two interacting RNA molecules have been proposed. Their folding targets are interaction structures, that can be represented as diagrams with two backb ones drawn horizontally on top of each other such that (1) intramolecular and intermolecular bonds are noncrossing and (2) there is no zig-zag configuration. This paper studies joint structures with arc-length at least four in which both, interior and exterior stack-lengths are at least two (no isolated arcs). The key idea in this paper is to consider a new type of shape, based on which joint structures can be derived via symbolic enumeration. Our results imply simple asymptotic formulas for the number of joint structures with surprisingly small exponential growth rates. They are of interest in the context of designing prediction algorithms for RNA-RNA interactions.

التوافقية الجزيئات الحيوية الأساليب الكمية

Combinatorics of RNA-RNA interaction

61 - Thomas J. X. Li , Christian M. Reidys 2010

RNA-RNA binding is an important phenomenon observed for many classes of non-coding RNAs and plays a crucial role in a number of regulatory processes. Recently several MFE folding algorithms for predicting the joint structure of two interacting RNA mo lecules have been proposed. Here joint structure means that in a diagram representation the intramolecular bonds of each partner are pseudoknot-free, that the intermolecular binding pairs are noncrossing, and that there is no so-called ``zig-zag configuration. This paper presents the combinatorics of RNA interaction structures including their generating function, singularity analysis as well as explicit recurrence relations. In particular, our results imply simple asymptotic formulas for the number of joint structures.

التوافقية مادة مكثفة ناعمة الفيزياء البيولوجية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد