ترغب بنشر مسار تعليمي؟ اضغط هنا

Parallel Protein Community Detection in Large-scale PPI Networks Based on Multi-source Learning

186   0   0.0 ( 0 )
 نشر من قبل Jianguo Chen
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Protein interactions constitute the fundamental building block of almost every life activity. Identifying protein communities from Protein-Protein Interaction (PPI) networks is essential to understand the principles of cellular organization and explore the causes of various diseases. It is critical to integrate multiple data resources to identify reliable protein communities that have biological significance and improve the performance of community detection methods for large-scale PPI networks. In this paper, we propose a Multi-source Learning based Protein Community Detection (MLPCD) algorithm by integrating Gene Expression Data (GED) and a parallel solution of MLPCD using cloud computing technology. To effectively discover the biological functions of proteins that participating in different cellular processes, GED under different conditions is integrated with the original PPI network to reconstruct a Weighted-PPI (WPPI) network. To flexibly identify protein communities of different scales, we define community modularity and functional cohesion measurements and detect protein communities from WPPI using an agglomerative method. In addition, we respectively compare the detected communities with known protein complexes and evaluate the functional enrichment of protein function modules using Gene Ontology annotations. Moreover, we implement a parallel version of the MLPCD algorithm on the Apache Spark platform to enhance the performance of the algorithm for large-scale realistic PPI networks. Extensive experimental results indicate the superiority and notable advantages of the MLPCD algorithm over the relevant algorithms in terms of accuracy and performance.



قيم البحث

اقرأ أيضاً

A common goal in network modeling is to uncover the latent community structure present among nodes. For many real-world networks, observed connections consist of events arriving as streams, which are then aggregated to form edges, ignoring the tempor al dynamic component. A natural way to take account of this temporal dynamic component of interactions is to use point processes as the foundation of the network models for community detection. Computational complexity hampers the scalability of such approaches to large sparse networks. To circumvent this challenge, we propose a fast online variational inference algorithm for learning the community structure underlying dynamic event arrivals on a network using continuous-time point process latent network models. We provide regret bounds on the loss function of this procedure, giving theoretical guarantees on performance. The proposed algorithm is illustrated, using both simulation studies and real data, to have comparable performance in terms of community structure in terms of community recovery to non-online variants. Our proposed framework can also be readily modified to incorporate other popular network structures.
We introduce a new paradigm that is important for community detection in the realm of network analysis. Networks contain a set of strong, dominant communities, which interfere with the detection of weak, natural community structure. When most of the members of the weak communities also belong to stronger communities, they are extremely hard to be uncovered. We call the weak communities the hidden community structure. We present a novel approach called HICODE (HIdden COmmunity DEtection) that identifies the hidden community structure as well as the dominant community structure. By weakening the strength of the dominant structure, one can uncover the hidden structure beneath. Likewise, by reducing the strength of the hidden structure, one can more accurately identify the dominant structure. In this way, HICODE tackles both tasks simultaneously. Extensive experiments on real-world networks demonstrate that HICODE outperforms several state-of-the-art community detection methods in uncovering both the dominant and the hidden structure. In the Facebook university social networks, we find multiple non-redundant sets of communities that are strongly associated with residential hall, year of registration or career position of the faculties or students, while the state-of-the-art algorithms mainly locate the dominant ground truth category. In the Due to the difficulty of labeling all ground truth communities in real-world datasets, HICODE provides a promising approach to pinpoint the existing latent communities and uncover communities for which there is no ground truth. Finding this unknown structure is an extremely important community detection problem.
108 - Jingfei Zhang , Yuguo Chen 2018
Heterogeneous networks are networks consisting of different types of nodes and multiple types of edges linking such nodes. While community detection has been extensively developed as a useful technique for analyzing networks that contain only one typ e of nodes, very few community detection techniques have been developed for heterogeneous networks. In this paper, we propose a modularity based community detection framework for heterogeneous networks. Unlike existing methods, the proposed approach has the flexibility to treat the number of communities as an unknown quantity. We describe a Louvain type maximization method for finding the community structure that maximizes the modularity function. Our simulation results show the advantages of the proposed method over existing methods. Moreover, the proposed modularity function is shown to be consistent under a heterogeneous stochastic blockmodel framework. Analyses of the DBLP four-area dataset and a MovieLens dataset demonstrate the usefulness of the proposed method.
We develop a Bayesian hierarchical model to identify communities in networks for which we do not observe the edges directly, but instead observe a series of interdependent signals for each of the nodes. Fitting the model provides an end-to-end commun ity detection algorithm that does not extract information as a sequence of point estimates but propagates uncertainties from the raw data to the community labels. Our approach naturally supports multiscale community detection as well as the selection of an optimal scale using model comparison. We study the properties of the algorithm using synthetic data and apply it to daily returns of constituents of the S&P100 index as well as climate data from US cities.
Analyzing the groups in the network based on same attributes, functions or connections between nodes is a way to understand network information. The task of discovering a series of node groups is called community detection. Generally, two types of in formation can be utilized to fulfill this task, i.e., the link structures and the node attributes. The temporal text network is a special kind of network that contains both sources of information. Typical representatives include online blog networks, the World Wide Web (WWW) and academic citation networks. In this paper, we study the problem of overlapping community detection in temporal text network. By examining 32 large temporal text networks, we find a lot of edges connecting two nodes with no common community and discover that nodes in the same community share similar textual contents. This scenario cannot be quantitatively modeled by practically all existing community detection methods. Motivated by these empirical observations, we propose MAGIC (Model Affiliation Graph with Interacting Communities), a generative model which captures community interactions and considers the information from both link structures and node attributes. Our experiments on 3 types of datasets show that MAGIC achieves large improvements over 4 state-of-the-art methods in terms of 4 widely-used metrics.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا