أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Andrea Pagnani

adabmDCA: Adaptive Boltzmann machine learning for biological sequences

115 - Anna Paola Muntoni , Andrea Pagnani , Martin Weigt 2021

Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conserva tion, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences. Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA. As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain. The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

الأساليب الكمية الأنظمة المضطربة والشبكات العصبية الجزيئات الحيوية

Relationship between fitness and heterogeneity in exponentially growing microbial populations

154 - Anna Paola Muntoni , Alfredo Braunstein , Andrea Pagnani 2021

Microbial metabolic networks perform the basic function of harvesting energy from nutrients to generate the work and free energy required for survival, growth and replication. The robust physiological outcomes they generate across vastly different or ganisms in spite of major environmental and genetic differences represent an especially remarkable trait. Most notably, it suggests that metabolic activity in bacteria may follow universal principles, the search for which is a long-standing issue. Most theoretical approaches to modeling metabolism assume that cells optimize specific evolutionarily-motivated objective functions (like their growth rate) under general physico-chemical or regulatory constraints. While conceptually and practically useful in many situations, the idea that certain objectives are optimized is hard to validate in data. Moreover, it is not clear how optimality can be reconciled with the degree of single-cell variability observed within microbial populations. To shed light on these issues, we propose here an inverse modeling framework that connects fitness to variability through the Maximum-Entropy guided inference of metabolic flux distributions from data. While no clear optimization emerges, we find that, as the medium gets richer, Escherichia coli populations slowly approach the theoretically optimal performance defined by minimal reduction of phenotypic variability at given mean growth rate. Inferred flux distributions provide multiple biological insights, including on metabolic reactions that are experimentally inaccessible. These results suggest that bacterial metabolism is crucially shaped by a population-level trade-off between fitness and cell-to-cell heterogeneity.

الشبكات الجزيئية الأنظمة المضطربة والشبكات العصبية الميكانيكا الإحصائية

Expectation propagation on the diluted Bayesian classifier

76 - Alfredo Braunstein , Thomas Gueudre , Andrea Pagnani 2020

Efficient feature selection from high-dimensional datasets is a very important challenge in many data-driven fields of science and engineering. We introduce a statistical mechanics inspired strategy that addresses the problem of sparse feature select ion in the context of binary classification by leveraging a computational scheme known as expectation propagation (EP). The algorithm is used in order to train a continuous-weights perceptron learning a classification rule from a set of (possibly partly mislabeled) examples provided by a teacher perceptron with diluted continuous weights. We test the method in the Bayes optimal setting under a variety of conditions and compare it to other state-of-the-art algorithms based on message passing and on expectation maximization approximate inference schemes. Overall, our simulations show that EP is a robust and competitive algorithm in terms of variable selection properties, estimation accuracy and computational complexity, especially when the student perceptron is trained from correlated patterns that prevent other iterative methods from converging. Furthermore, our numerical tests demonstrate that the algorithm is capable of learning online the unknown values of prior parameters, such as the dilution level of the weights of the teacher perceptron and the fraction of mislabeled examples, quite accurately. This is achieved by means of a simple maximum likelihood strategy that consists in minimizing the free energy associated with the EP algorithm.

التعلم الالي الأنظمة المضطربة والشبكات العصبية الميكانيكا الإحصائية

Aligning biological sequences by exploiting residue conservation and coevolution

101 - Anna Paola Muntoni , Andrea Pagnani , Martin Weigt 2020

Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e. arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position-specificities like conservation in sequences, but assume an independent evolution of different positions. Over the last years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles; and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.

الأساليب الكمية الأنظمة المضطربة والشبكات العصبية الفيزياء البيولوجية

Compressed sensing reconstruction using Expectation Propagation

121 - Alfredo Braunstein , Anna Paola Muntoni , Andrea Pagnani 2019

Many interesting problems in fields ranging from telecommunications to computational biology can be formalized in terms of large underdetermined systems of linear equations with additional constraints or regularizers. One of the most studied ones, th e Compressed Sensing problem (CS), consists in finding the solution with the smallest number of non-zero components of a given system of linear equations $boldsymbol y = mathbf{F} boldsymbol{w}$ for known measurement vector $boldsymbol{y}$ and sensing matrix $mathbf{F}$. Here, we will address the compressed sensing problem within a Bayesian inference framework where the sparsity constraint is remapped into a singular prior distribution (called Spike-and-Slab or Bernoulli-Gauss). Solution to the problem is attempted through the computation of marginal distributions via Expectation Propagation (EP), an iterative computational scheme originally developed in Statistical Physics. We will show that this strategy is comparatively more accurate than the alternatives in solving instances of CS generated from statistically correlated measurement matrices. For computational strategies based on the Bayesian framework such as variants of Belief Propagation, this is to be expected, as they implicitly rely on the hypothesis of statistical independence among the entries of the sensing matrix. Perhaps surprisingly, the method outperforms uniformly also all the other state-of-the-art methods in our tests.

التعلم الالي الأنظمة المضطربة والشبكات العصبية الميكانيكا الإحصائية

ceRNA crosstalk stabilizes protein expression and affects the correlation pattern of interacting proteins

87 - Araks Martirosyan , Andrea De Martino , Andrea Pagnani 2017

Gene expression is a noisy process and several mechanisms, both transcriptional and posttranscriptional, can stabilize protein levels in cells. Much work has focused on the role of miRNAs, showing in particular that miRNA-mediated regulation can buff er expression noise for lowly expressed genes. Here, using in silico simulations and mathematical modeling, we demonstrate that miRNAs can exert a much broader influence on protein levels by orchestrating competition-induced crosstalk between mRNAs. Most notably, we find that miRNA-mediated cross-talk (i) can stabilize protein levels across the full range of gene expression rates, and (ii) modifies the correlation pattern of co-regulated interacting proteins, changing the sign of correlations from negative to positive. The latter feature may constitute a potentially robust signature of the existence of RNA crosstalk induced by endogenous competition for miRNAs in standard cellular conditions.

الشبكات الجزيئية

An analytic approximation of the feasible space of metabolic networks

77 - Alfredo Braunstein , Anna Paola Muntoni , Andrea Pagnani 2017

Assuming a steady-state condition within a cell, metabolic fluxes satisfy an under-determined linear system of stoichiometric equations. Characterizing the space of fluxes that satisfy such equations along with given bounds (and possibly additional r elevant constraints) is considered of utmost importance for the understanding of cellular metabolism. Extreme values for each individual flux can be computed with Linear Programming (as Flux Balance Analysis), and their marginal distributions can be approximately computed with Monte-Carlo sampling. Here we present an approximate analytic method for the latter task based on Expectation Propagation equations that does not involve sampling and can achieve much better predictions than other existing analytic methods. The method is iterative, and its computation time is dominated by one matrix inversion per iteration. With respect to sampling, we show through extensive simulation that it has some advantages including computation time, and the ability to efficiently fix empirically estimated distributions of fluxes.

الفيزياء البيولوجية الميكانيكا الإحصائية الشبكات الجزيئية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد