بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Analysis of the Relationships among Longest Common Subsequences, Shortest Common Supersequences and Patterns and its application on Pattern Discovery in Biological Sequences

526 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Kang Ning

تاريخ النشر 2009

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Kang Ning - Hoong Kee Ng - Hon Wai Leong

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة استرجاع المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

For a set of mulitple sequences, their patterns,Longest Common Subsequences (LCS) and Shortest Common Supersequences (SCS) represent different aspects of these sequences profile, and they can all be used for biological sequence comparisons and analysis. Revealing the relationship between the patterns and LCS,SCS might provide us with a deeper view of the patterns of biological sequences, in turn leading to better understanding of them. However, There is no careful examinaton about the relationship between patterns, LCS and SCS. In this paper, we have analyzed their relation, and given some lemmas. Based on their relations, a set of algorithms called the PALS (PAtterns by Lcs and Scs) algorithms are propsoed to discover patterns in a set of biological sequences. These algorithms first generate the results for LCS and SCS of sequences by heuristic, and consequently derive patterns from these results. Experiments show that the PALS algorithms perform well (both in efficiency and in accuracy) on a variety of sequences. The PALS approach also provides us with a solution for transforming between the heuristic results of SCS and LCS.

قيم البحث

539 - Paola Bonizzoni , Gianluca Della Vedova , Riccardo Dondi 2009

In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N , the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such that the number of occurrences in s of each symbol a in A is upper bounded by Co(a). The DC-LCS problem provides a clear mathematical formulation of a sequence comparison problem in Computational Biology and generalizes two other constrained variants of the LCS problem: the Constrained LCS and the Repetition-Free LCS. We present two results for the DC-LCS problem. First, we illustrate a fixed-parameter algorithm where the parameter is the length of the solution. Secondly, we prove a parameterized hardness result for the Constrained LCS problem when the parameter is the number of the constraint strings and the size of the alphabet A. This hardness result also implies the parameterized hardness of the DC-LCS problem (with the same parameters) and its NP-hardness when the size of the alphabet is constant.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

Optimal alignments of longest common subsequences and their path properties

330 - Juri Lember , Heinrich Matzinger , Anna Vollmer 2014

We investigate the behavior of optimal alignment paths for homologous (related) and independent random sequences. An alignment between two finite sequences is optimal if it corresponds to the longest common subsequence (LCS). We prove the existence o f lowest and highest optimal alignments and study their differences. High differences between the extremal alignments imply the high variety of all optimal alignments. We present several simulations indicating that the homologous (having the same common ancestor) sequences have typically the distance between the extremal alignments of much smaller size than independent sequences. In particular, the simulations suggest that for the homologous sequences, the growth of the distance between the extremal alignments is logarithmical. The main theoretical results of the paper prove that (under some assumptions) this is the case, indeed. The paper suggests that the properties of the optimal alignment paths characterize the relatedness of the sequences.

نظرية الإحصاء نظرية الإحصاء

Longest common subsequences between words of very unequal length

124 - Boris Bukh , Zichao Dong 2020

We consider the expected length of the longest common subsequence between two random words of lengths $n$ and $(1-varepsilon)kn$ over $k$-symbol alphabet. It is well-known that this quantity is asymptotic to $gamma_{k,varepsilon} n$ for some constant $gamma_{k,varepsilon}$. We show that $gamma_{k,varepsilon}$ is of the order $1-cvarepsilon^2$ uniformly in $k$ and $varepsilon$. In addition, for large $k$, we give evidence that $gamma_{k,varepsilon}$ approaches $1-tfrac{1}{4}varepsilon^2$, and prove a matching lower bound.

الاحتمالات التوافقية

On the limiting law of the length of the longest common and increasing subsequences in random words

469 - Jean-Christophe Breton , Christian Houdre 2015

Let $X=(X_i)_{ige 1}$ and $Y=(Y_i)_{ige 1}$ be two sequences of independent and identically distributed (iid) random variables taking their values, uniformly, in a common totally ordered finite alphabet. Let LCI$_n$ be the length of the longest commo n and (weakly) increasing subsequence of $X_1cdots X_n$ and $Y_1cdots Y_n$. As $n$ grows without bound, and when properly centered and normalized, LCI$_n$ is shown to converge, in distribution, towards a Brownian functional that we identify.

الاحتمالات

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

175 - Jin Cao , Dewei Zhong 2020

Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for $L$ stri ngs is NP-hard, e.g., the computational complexity is exponential in $L$. In this paper, we develop a randomized algorithm, referred to as {em Random-MCS}, for finding a random instance of Maximal Common Subsequence ($MCS$) of multiple strings. A common subsequence is {em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in $L$, and therefore is suitable for large $L$. Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of {em Random-MCS} often yields a solution to $LCS$.

بنى وهياكل البيانات والخوارزميات الذكاء الاصطناعي التعقيد الحسابي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حلب

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Analysis of the Relationships among Longest Common Subsequences, Shortest Common Supersequences and Patterns and its application on Pattern Discovery in Biological Sequences

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً