Community detection, pattern recognition, and hypergraph-based learning: approaches using metric geometry and persistent homology

89 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dong Quan Nguyen

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Dong Quan Ngoc Nguyen - Lin Xing -

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Hypergraph data appear and are hidden in many places in the modern age. They are data structure that can be used to model many real data examples since their structures contain information about higher order relations among data points. One of the main contributions of our paper is to introduce a new topological structure to hypergraph data which bears a resemblance to a usual metric space structure. Using this new topological space structure of hypergraph data, we propose several approaches to study community detection problem, detecting persistent features arising from homological structure of hypergraph data. Also based on the topological space structure of hypergraph data introduced in our paper, we introduce a modified nearest neighbors methods which is a generalization of the classical nearest neighbors methods from machine learning. Our modified nearest neighbors methods have an advantage of being very flexible and applicable even for discrete structures as in hypergraphs. We then apply our modified nearest neighbors methods to study sign prediction problem in hypegraph data constructed using our method.

قيم البحث

77 - Niko Motschnig , Alexander Ramharter , Oliver Schweiger 2021

In this work, we explore four common algorithms for community detection in networks, namely Agglomerative Hierarchical Clustering, Divisive Hierarchical Clustering (Girvan-Newman), Fastgreedy and the Louvain Method. We investigate their mechanics and compare their differences in terms of implementation and results of the clustering behavior on a standard dataset. We further propose some enhancements to these algorithms that show promising results in our evaluations, such as self-neighboring for Neighbor Matrix constructions, a deterministic slightly faster version of the Louvain Method that favors less bigger clusters and various implementation changes to the Fastgreedy algorithm.

الشبكات الاجتماعية والمعلومات

A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning

222 - Di Jin , Zhizhi Yu , Pengfei Jiao 2021

Community detection, a fundamental task for network analysis, aims to partition a network into multiple sub-structures to help reveal their latent functions. Community detection has been extensively studied in and broadly applied to many real-world n etwork problems. Classical approaches to community detection typically utilize probabilistic graphical models and adopt a variety of prior knowledge to infer community structures. As the problems that network methods try to solve and the network data to be analyzed become increasingly more sophisticated, new approaches have also been proposed and developed, particularly those that utilize deep learning and convert networked data into low dimensional representation. Despite all the recent advancement, there is still a lack of insightful understanding of the theoretical and methodological underpinning of community detection, which will be critically important for future development of the area of network analysis. In this paper, we develop and present a unified architecture of network community-finding methods to characterize the state-of-the-art of the field of community detection. Specifically, we provide a comprehensive review of the existing community detection methods and introduce a new taxonomy that divides the existing methods into two categories, namely probabilistic graphical model and deep learning. We then discuss in detail the main idea behind each method in the two categories. Furthermore, to promote future development of community detection, we release several benchmark datasets from several problem domains and highlight their applications to various network analysis tasks. We conclude with discussions of the challenges of the field and suggestions of possible directions for future research.

الشبكات الاجتماعية والمعلومات الذكاء الاصطناعي التعلم الآلي

Interpretable Phase Detection and Classification with Persistent Homology

99 - Alex Cole , Gregory J. Loges , Gary Shiu 2020

We apply persistent homology to the task of discovering and characterizing phase transitions, using lattice spin models from statistical physics for working examples. Persistence images provide a useful representation of the homological data for cond ucting statistical tasks. To identify the phase transitions, a simple logistic regression on these images is sufficient for the models we consider, and interpretable order parameters are then read from the weights of the regression. Magnetization, frustration and vortex-antivortex structure are identified as relevant features for characterizing phase transitions.

الميكانيكا الإحصائية التعلم الآلي الطوبولوجيا الجبرية

Parallel Protein Community Detection in Large-scale PPI Networks Based on Multi-source Learning

185 - Jianguo Chen , Kenli Li , Kashif Bilal 2018

Protein interactions constitute the fundamental building block of almost every life activity. Identifying protein communities from Protein-Protein Interaction (PPI) networks is essential to understand the principles of cellular organization and explo re the causes of various diseases. It is critical to integrate multiple data resources to identify reliable protein communities that have biological significance and improve the performance of community detection methods for large-scale PPI networks. In this paper, we propose a Multi-source Learning based Protein Community Detection (MLPCD) algorithm by integrating Gene Expression Data (GED) and a parallel solution of MLPCD using cloud computing technology. To effectively discover the biological functions of proteins that participating in different cellular processes, GED under different conditions is integrated with the original PPI network to reconstruct a Weighted-PPI (WPPI) network. To flexibly identify protein communities of different scales, we define community modularity and functional cohesion measurements and detect protein communities from WPPI using an agglomerative method. In addition, we respectively compare the detected communities with known protein complexes and evaluate the functional enrichment of protein function modules using Gene Ontology annotations. Moreover, we implement a parallel version of the MLPCD algorithm on the Apache Spark platform to enhance the performance of the algorithm for large-scale realistic PPI networks. Extensive experimental results indicate the superiority and notable advantages of the MLPCD algorithm over the relevant algorithms in terms of accuracy and performance.

الشبكات الاجتماعية والمعلومات التعلم الآلي التعلم الالي

Routine pattern discovery and anomaly detection in individual travel behavior

72 - Lijun Sun , Xinyu Chen , Zhaocheng He 2020

Discovering patterns and detecting anomalies in individual travel behavior is a crucial problem in both research and practice. In this paper, we address this problem by building a probabilistic framework to model individual spatiotemporal travel beha vior data (e.g., trip records and trajectory data). We develop a two-dimensional latent Dirichlet allocation (LDA) model to characterize the generative mechanism of spatiotemporal trip records of each traveler. This model introduces two separate factor matrices for the spatial dimension and the temporal dimension, respectively, and use a two-dimensional core structure at the individual level to effectively model the joint interactions and complex dependencies. This model can efficiently summarize travel behavior patterns on both spatial and temporal dimensions from very sparse trip sequences in an unsupervised way. In this way, complex travel behavior can be modeled as a mixture of representative and interpretable spatiotemporal patterns. By applying the trained model on future/unseen spatiotemporal records of a traveler, we can detect her behavior anomalies by scoring those observations using perplexity. We demonstrate the effectiveness of the proposed modeling framework on a real-world license plate recognition (LPR) data set. The results confirm the advantage of statistical learning methods in modeling sparse individual travel behavior data. This type of pattern discovery and anomaly detection applications can provide useful insights for traffic monitoring, law enforcement, and individual travel behavior profiling.

الشبكات الاجتماعية والمعلومات أجهزة الكمبيوتر والمجتمع تطبيقات الإحصاء