بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

The ground truth about metadata and community detection in networks

88 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniel Larremore

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية فيزياء

والبحث باللغة English

تأليف Leto Peel - Daniel B. Larremore -

الشبكات الاجتماعية والمعلومات تحليل البيانات والإحصاءات والاحتمال الفيزياء والمجتمع

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex systems components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because such networks links are formed explicitly based on those known communities. However, there are no planted communities in real world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. Here, we show that metadata are not the same as ground truth, and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structure.

قيم البحث

487 - Daniel B. Larremore , Aaron Clauset , Abigail Z. Jacobs 2014

Bipartite networks are a common type of network data in which there are two types of vertices, and only vertices of different types can be connected. While bipartite networks exhibit community structure like their unipartite counterparts, existing ap proaches to bipartite community detection have drawbacks, including implicit parameter choices, loss of information through one-mode projections, and lack of interpretability. Here we solve the community detection problem for bipartite networks by formulating a bipartite stochastic block model, which explicitly includes vertex type information and may be trivially extended to $k$-partite networks. This bipartite stochastic block model yields a projection-free and statistically principled method for community detection that makes clear assumptions and parameter choices and yields interpretable results. We demonstrate this models ability to efficiently and accurately find community structure in synthetic bipartite networks with known structure and in real-world bipartite networks with unknown structure, and we characterize its performance in practical contexts.

الشبكات الاجتماعية والمعلومات تحليل البيانات والإحصاءات والاحتمال الفيزياء والمجتمع

Network constraints on the mixing patterns of binary node metadata

98 - Matteo Cinelli , Leto Peel , Antonio Iovanella 2019

We consider the network constraints on the bounds of the assortativity coefficient, which measures the tendency of nodes with the same attribute values to be interconnected. The assortativity coefficient is the Pearsons correlation coefficient of nod e attribute values across network edges and ranges between -1 and 1. We focus here on the assortativity of binary node attributes and show that properties of the network, such as degree distribution and the number of nodes with each attribute value place constraints upon the attainable values of the assortativity coefficient. We explore the assortativity in three different spaces, that is, ensembles of graph configurations and node-attribute assignments that are valid for a given set of network constraints. We provide means for obtaining bounds on the extremal values of assortativity for each of these spaces. Finally, we demonstrate that under certain conditions the network constraints severely limit the maximum and minimum values of assortativity, which may present issues in how we interpret the assortativity coefficient.

الشبكات الاجتماعية والمعلومات تحليل البيانات والإحصاءات والاحتمال الفيزياء والمجتمع

Community detection in networks without observing edges

76 - Till Hoffmann , Leto Peel , Renaud Lambiotte 2018

We develop a Bayesian hierarchical model to identify communities in networks for which we do not observe the edges directly, but instead observe a series of interdependent signals for each of the nodes. Fitting the model provides an end-to-end commun ity detection algorithm that does not extract information as a sequence of point estimates but propagates uncertainties from the raw data to the community labels. Our approach naturally supports multiscale community detection as well as the selection of an optimal scale using model comparison. We study the properties of the algorithm using synthetic data and apply it to daily returns of constituents of the S&P100 index as well as climate data from US cities.

الشبكات الاجتماعية والمعلومات التعلم الآلي الفيزياء والمجتمع

Multilayer Modularity Belief Propagation To Assess Detectability Of Community Structure

72 - William H. Weir , Benjamin Walker , Lenka Zdeborova 2019

Modularity based community detection encompasses a number of widely used, efficient heuristics for identification of structure in networks. Recently, a belief propagation approach to modularity optimization provided a useful guide for identifying non -trivial structure in single-layer networks in a way that other optimization heuristics have not. In this paper, we extend modularity belief propagation to multilayer networks. As part of this development, we also directly incorporate a resolution parameter. We show that adjusting the resolution parameter affects the convergence properties of the algorithm and yields different community structures than the baseline. We compare our approach with a widely used community detection tool, GenLouvain, across a range of synthetic, multilayer benchmark networks, demonstrating that our method performs comparably to the state of the art. Finally, we demonstrate the practical advantages of the additional information provided by our tool by way of two real-world network examples. We show how the convergence properties of the algorithm can be used in selecting the appropriate resolution and coupling parameters and how the node-level marginals provide an interpretation for the strength of attachment to the identified communities. We have released our tool as a Python package for convenient use.

الشبكات الاجتماعية والمعلومات تحليل البيانات والإحصاءات والاحتمال الفيزياء والمجتمع

Link persistence and conditional distances in multiplex networks

155 - Fragkiskos Papadopoulos , Kaj-Kolja Kleineberg 2018

Recent progress towards unraveling the hidden geometric organization of real multiplexes revealed significant correlations across the hyperbolic node coordinates in different network layers, which facilitated applications like trans-layer link predic tion and mutual navigation. But are geometric correlations alone sufficient to explain the topological relation between the layers of real systems? Here we provide the negative answer to this question. We show that connections in real systems tend to persist from one layer to another irrespectively of their hyperbolic distances. This suggests that in addition to purely geometric aspects the explicit link formation process in one layer impacts the topology of other layers. Based on this finding, we present a simple modification to the recently developed Geometric Multiplex Model to account for this effect, and show that the extended model can reproduce the behavior observed in real systems. We also find that link persistence is significant in all considered multiplexes and can explain their layers high edge overlap, which cannot be explained by coordinate correlations alone. Furthermore, by taking both link persistence and hyperbolic distance correlations into account we can improve trans-layer link prediction. These findings guide the development of multiplex embedding methods, suggesting that such methods should be accounting for both coordinate correlations and link persistence across layers.

الشبكات الاجتماعية والمعلومات تحليل البيانات والإحصاءات والاحتمال الفيزياء والمجتمع

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة المأمون الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The ground truth about metadata and community detection in networks

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً