بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Designing RNA Secondary Structures is Hard

63 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Florian Sikora

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Edouard Bonnet - Pawe{l} Rzk{a}.zewski - Florian Sikora

بنى وهياكل البيانات والخوارزميات التوافقية الجزيئات الحيوية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

An RNA sequence is a word over an alphabet on four elements ${A,C,G,U}$ called bases. RNA sequences fold into secondary structures where some bases match one another while others remain unpaired. Pseudoknot-free secondary structures can be represented as well-parenthesized expressions with additional dots, where pairs of matching parentheses symbolize paired bases and dots, unpaired bases. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some model of energy and to design sequences of bases which will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS 2016). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Anne Condon (ICALP 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this paper we show that, in the simplest model of energy which is the Watson-Crick model the design of secondary structures is NP-complete if one adds natural constraints of the form: index $i$ of the sequence has to be labeled by base $b$. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice.

قيم البحث

259 - Thomas J. X. Li , Christian M. Reidys 2018

In this paper we analyze the length-spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length-spe ctrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least $n-O(n^{1/2})$ and that with high probability any other rainbow has finite length. We show that the distribution of the length of the longest rainbow converges to a discrete limit law and that, for finite $k$, the distribution of rainbows of length $k$, becomes for large $n$ a negative binomial distribution. We then put the results of this paper into context, comparing the analytical results with those observed in RNA minimum free energy structures, biological RNA structures and relate our findings to the sparsification of folding algorithms.

التوافقية الاحتمالات الجزيئات الحيوية

The block spectrum of RNA pseudoknot structures

76 - Thomas J. X. Li , Christina S. Burris , Christian M. Reidys 2018

In this paper we analyze the length-spectrum of blocks in $gamma$-structures. $gamma$-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time RNA folding. A $gamma$-structure is constructed by nesti ng and concatenating specific building components having topological genus at most $gamma$. A block is a substructure enclosed by crossing maximal arcs with respect to the partial order induced by nesting. We show that, in uniformly generated $gamma$-structures, there is a significant gap in this length-spectrum, i.e., there asymptotically almost surely exists a unique longest block of length at least $n-O(n^{1/2})$ and that with high probability any other block has finite length. For fixed $gamma$, we prove that the length of the longest block converges to a discrete limit law, and that the distribution of short blocks of given length tends to a negative binomial distribution in the limit of long sequences. We refine this analysis to the length spectrum of blocks of specific pseudoknot types, such as H-type and kissing hairpins. Our results generalize the rainbow spectrum on secondary structures by the first and third authors and are being put into context with the structural prediction of long non-coding RNAs.

التوافقية الاحتمالات الجزيئات الحيوية

RNA secondary structures having a compatible sequence of certain nucleotide ratios

109 - Christopher L. Barrett , Thomas J. X. Li , Christian M. Reidys 2016

Given a random RNA secondary structure, $S$, we study RNA sequences having fixed ratios of nuclotides that are compatible with $S$. We perform this analysis for RNA secondary structures subject to various base pairing rules and minimum arc- and stack -length restrictions. Our main result reads as follows: in the simplex of the nucleotide ratios there exists a convex region in which, in the limit of long sequences, a random structure a.a.s.~has compatible sequence with these ratios and outside of which a.a.s.~a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base pairing rules and minimum arc- and stack-length restrictions. In particular, for {bf GC}-sequences having a ratio of {bf G} nucleotides smaller than $1/3$, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s.~no such compatible sequence. For sequences having a ratio of {bf G} nucleotides larger than $1/3$, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.

التوافقية الجزيئات الحيوية الأساليب الكمية

The Expected Order of Saturated RNA Secondary Structures

600 - Emma Yu Jin , Markus E. Nebel 2011

We show the expected order of RNA saturated secondary structures of size $n$ is $log_4n(1+O(frac{log_2n}{n}))$, if we select the saturated secondary structure uniformly at random. Furthermore, the order of saturated secondary structures is sharply co ncentrated around its mean. As a consequence saturated structures and structures in the traditional model behave the same with respect to the expected order. Thus we may conclude that the traditional model has already drawn the right picture and conclusions inferred from it with respect to the order (the overall shape) of a structure remain valid even if enforcing saturation (at least in expectation).

التوافقية نظرية المعلومات نظرية المعلومات

Neural representation and generation for RNA secondary structures

85 - Zichao Yan , William L. Hamilton , Mathieu Blanchette 2021

Our work is concerned with the generation and targeted design of RNA, a type of genetic macromolecule that can adopt complex structures which influence their cellular activities and functions. The design of large scale and complex biological structur es spurs dedicated graph-based deep generative modeling techniques, which represents a key but underappreciated aspect of computational drug discovery. In this work, we investigate the principles behind representing and generating different RNA structural modalities, and propose a flexible framework to jointly embed and generate these molecular structures along with their sequence in a meaningful latent space. Equipped with a deep understanding of RNA molecular structures, our most sophisticated encoding and decoding methods operate on the molecular graph as well as the junction tree hierarchy, integrating strong inductive bias about RNA structural regularity and folding mechanism such that high structural validity, stability and diversity of generated RNAs are achieved. Also, we seek to adequately organize the latent space of RNA molecular embeddings with regard to the interaction with proteins, and targeted optimization is used to navigate in this latent space to search for desired novel RNA molecules.

الجزيئات الحيوية التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة السورية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Designing RNA Secondary Structures is Hard

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً