Aligning protein-protein interaction (PPI) networks of different species has drawn a considerable interest recently. This problem is important to investigate evolutionary conserved pathways or protein complexes across species, and to help in the identification of functional orthologs through the detection of conserved interactions. It is however a difficult combinatorial problem, for which only heuristic methods have been proposed so far. We reformulate the PPI alignment as a graph matching problem, and investigate how state-of-the-art graph matching algorithms can be used for that purpose. We differentiate between two alignment problems, depending on whether strict constraints on protein matches are given, based on sequence similarity, or whether the goal is instead to find an optimal compromise between sequence similarity and interaction conservation in the alignment. We propose new methods for both cases, and assess their performance on the alignment of the yeast and fly PPI networks. The new methods consistently outperform state-of-the-art algorithms, retrieving in particular 78% more conserved interactions than IsoRank for a given level of sequence similarity. Availability:http://cbio.ensmp.fr/proj/graphm_ppi/, additional data and codes are available upon request. Contact: [email protected]
The determination of protein functions is one of the most challenging problems of the post-genomic era. The sequencing of entire genomes and the possibility to access genes co-expression patterns has moved the attention from the study of single proteins or small complexes to that of the entire proteome. In this context, the search for reliable methods for proteins function assignment is of uttermost importance. Previous approaches to deduce the unknown function of a class of proteins have exploited sequence similarities or clustering of co-regulated genes, phylogenetic profiles, protein-protein interactions, and protein complexes. We propose to assign functional classes to proteins from their network of physical interactions, by minimizing the number of interacting proteins with different categories. The function assignment is made on a global scale and depends on the entire connectivity pattern of the protein network. Multiple functional assignments are made possible as a consequence of the existence of multiple equivalent solutions. The method is applied to the yeast Saccharomices Cerevisiae protein-protein interaction network. Robustness is tested in presence of a high percentage of unclassified proteins and under deletion/insertion of interactions.
Motivation: High-throughput experimental techniques have been producing more and more protein-protein interaction (PPI) data. PPI network alignment greatly benefits the understanding of evolutionary relationship among species, helps identify conserved sub-networks and provides extra information for functional annotations. Although a few methods have been developed for multiple PPI network alignment, the alignment quality is still far away from perfect and thus, new network alignment methods are needed. Result: In this paper, we present a novel method, denoted as ConvexAlign, for joint alignment of multiple PPI networks by convex optimization of a scoring function composed of sequence similarity, topological score and interaction conservation score. In contrast to existing methods that generate multiple alignments in a greedy or progressive manner, our convex method optimizes alignments globally and enforces consistency among all pairwise alignments, resulting in much better alignment quality. Tested on both synthetic and real data, our experimental results show that ConvexAlign outperforms several popular methods in producing functionally coherent alignments. ConvexAlign even has a larger advantage over the others in aligning real PPI networks. ConvexAlign also finds a few conserved complexes among 5 species which cannot be detected by the other methods.
From the spectral plot of the (normalized) graph Laplacian, the essential qualitative properties of a network can be simultaneously deduced. Given a class of empirical networks, reconstruction schemes for elucidating the evolutionary dynamics leading to those particular data can then be developed. This method is exemplified for protein-protein interaction networks. Traces of their evolutionary history of duplication and divergence processes are identified. In particular, we can identify typical specific features that robustly distinguish protein-protein interaction networks from other classes of networks, in spite of possible statistical fluctuations of the underlying data.
Complexes of physically interacting proteins are one of the fundamental functional units responsible for driving key biological mechanisms within the cell. Their identification is therefore necessary not only to understand complex formation but also the higher level organization of the cell. With the advent of high-throughput techniques in molecular biology, significant amount of physical interaction data has been cataloged from organisms such as yeast, which has in turn fueled computational approaches to systematically mine complexes from the network of physical interactions among proteins (PPI network). In this survey, we review, classify and evaluate some of the key computational methods developed till date for the identification of protein complexes from PPI networks. We present two insightful taxonomies that reflect how these methods have evolved over the years towards improving automated complex prediction. We also discuss some open challenges facing accurate reconstruction of complexes, the crucial ones being presence of high proportion of errors and noise in current high-throughput datasets and some key aspects overlooked by current complex detection methods. We hope this review will not only help to condense the history of computational complex detection for easy reference, but also provide valuable insights to drive further research in this area.
Two processes can influence the evolution of protein interaction networks: addition and elimination of interactions between proteins, and gene duplications increasing the number of proteins and interactions. The rates of these processes can be estimated from available Saccharomyces cerevisiae genome data and are sufficiently high to affect network structure on short time scales. For instance, more than 100 interactions may be added to the yeast network every million years, a substantial fraction of which adds previously unconnected proteins to the network. Highly connected proteins show a greater rate of interaction turnover than proteins with few interactions. From these observations one can explain ? without natural selection on global network structure ? the evolutionary sustenance of the most prominent network feature, the distribution of the frequency P(d) of proteins with d neighbors, which is a broad-tailed distribution. This distribution is independent of the experimental approach providing nformation on network structure.