Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Aligning biological sequences by exploiting residue conservation and coevolution

102 0 0.0 ( 0 )

Download Cite

Added by Anna Paola Muntoni

Publication date 2020

fields Biology Physics

and research's language is English

Authors Anna Paola Muntoni - Andrea Pagnani - Martin Weigt

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e. arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position-specificities like conservation in sequences, but assume an independent evolution of different positions. Over the last years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles; and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.

rate research

adabmDCA: Adaptive Boltzmann machine learning for biological sequences

115 - Anna Paola Muntoni , Andrea Pagnani , Martin Weigt 2021

Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences. Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA. As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain. The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

Quantitative Methods Disordered Systems and Neural Networks Biomolecules

Kinetic modelling of competition and depletion of shared miRNAs by competing endogenous RNAs

82 - Araks Martirosyan , Marco Del Giudice , Chiara Enrico Bena 2018

Non-conding RNAs play a key role in the post-transcriptional regulation of mRNA translation and turnover in eukaryotes. miRNAs, in particular, interact with their target RNAs through protein-mediated, sequence-specific binding, giving rise to extended and highly heterogeneous miRNA-RNA interaction networks. Within such networks, competition to bind miRNAs can generate an effective positive coupling between their targets. Competing endogenous RNAs (ceRNAs) can in turn regulate each other through miRNA-mediated crosstalk. Albeit potentially weak, ceRNA interactions can occur both dynamically, affecting e.g. the regulatory clock, and at stationarity, in which case ceRNA networks as a whole can be implicated in the composition of the cells proteome. Many features of ceRNA interactions, including the conditions under which they become significant, can be unraveled by mathematical and in silico models. We review the understanding of the ceRNA effect obtained within such frameworks, focusing on the methods employed to quantify it, its role in the processing of gene expression noise, and how network topology can determine its reach.

Molecular Networks Disordered Systems and Neural Networks Biological Physics

SERS discrimination of single amino acid residue in single peptide by plasmonic nanocavities

332 - Jian-An Huang , Mansoureh Z. Mousavi , Giorgia Giovannini 2019

Surface-enhanced Raman spectroscopy (SERS) is a sensitive label-free optical method that can provide fingerprint Raman spectra of biomolecules such as DNA, amino acids and proteins. While SERS of single DNA molecule has been recently demonstrated, Raman analysis of single protein sequence was not possible because the SERS spectra of proteins are usually dominated by signals of aromatic amino acid residues. Here, we used electroplasmonic approach to trap single gold nanoparticle in a nanohole for generating a plasmonic nanocavity between the trapped nanoparticle and the nanopore wall. The giant field generated in the nanocavity was so sensitive and localized that it enables SERS discrimination of 10 distinct amino acids at single-molecule level. The obtained spectra are used to analyze the spectra of 2 biomarkers (Vasopressin and Oxytocin) made of a short sequence of 9 amino-acids. Significantly, we demonstrated identification of single non-aromatic amino acid residues in a single short peptide chain as well as discrimination between two peptides with sequences distinguishable in 2 specific amino-acids. Our result demonstrate the high sensitivity of our method to identify single amino acid residue in a protein chain and a potential for further applications in proteomics and single-protein sequencing.

Quantitative Methods Optics

Power laws in biological networks

156 - E. Almaas , A.-L. Barabasi 2004

The rapidly developing theory of complex networks indicates that real networks are not random, but have a highly robust large-scale architecture, governed by strict organizational principles. Here, we focus on the properties of biological networks, discussing their scale-free and hierarchical features. We illustrate the major network characteristics using examples from the metabolic network of the bacterium Escherichia coli. We also discuss the principles of network utilization, acknowledging that the interactions in a real network have unequal strengths. We study the interplay between topology and reaction fluxes provided by flux-balance analysis. We find that the cellular utilization of the metabolic network is both globally and locally highly inhomogeneous, dominated by hot-spots, representing connected high-flux pathways.

Molecular Networks Disordered Systems and Neural Networks Cell Behavior

PyBioNetFit and the Biological Property Specification Language

124 - Eshan D. Mitra , Ryan Suderman , Joshua Colvin 2019

In systems biology modeling, important steps include model parameterization, uncertainty quantification, and evaluation of agreement with experimental observations. To help modelers perform these steps, we developed the software PyBioNetFit. PyBioNetFit is designed for parameterization, and also supports uncertainty quantification, checking models against known system properties, and solving design problems. PyBioNetFit introduces the Biological Property Specification Language (BPSL) for the formal declaration of system properties. BPSL allows qualitative data to be used alone or in combination with quantitative data for parameterization model checking, and design. PyBioNetFit performs parameterization with parallelized metaheuristic optimization algorithms (differential evolution, particle swarm optimization, scatter search) that work directly with existing model definition standards: BioNetGen Language (BNGL) and Systems Biology Markup Language (SBML). We demonstrate PyBioNetFits capabilities by solving 31 example problems, including the challenging problem of parameterizing a model of cell cycle control in yeast. We benchmark PyBioNetFits parallelization efficiency on computer clusters, using up to 288 cores. Finally, we demonstrate the model checking and design applications of PyBioNetFit and BPSL by analyzing a model of therapeutic interventions in autophagy signaling.

Quantitative Methods

comments

Fetching comments

Al-Andalus University for Medical Sciences

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Aligning biological sequences by exploiting residue conservation and coevolution

Ask ChatGPT about the research

No Arabic abstract

Read More