ﻻ يوجد ملخص باللغة العربية
We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard BLASTP seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in BLASTP and vector seeds, our seeds show a similar or even better performance than BLASTP on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds vs. BLASTP.
From the spectral plot of the (normalized) graph Laplacian, the essential qualitative properties of a network can be simultaneously deduced. Given a class of empirical networks, reconstruction schemes for elucidating the evolutionary dynamics leading
Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatoria
We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several
BACKGROUND: One of the most evident achievements of bioinformatics is the development of methods that transfer biological knowledge from characterised proteins to uncharacterised sequences. This mode of protein function assignment is mostly based on
This article introduces a novel protein structure alignment method (named TALI) based on the protein backbone torsion angle instead of the more traditional distance matrix. Because the structural alignment of the two proteins is based on the comparis