Graph Similarity Description: How Are These Graphs Similar?

195 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Corinna Coupette

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Corinna Coupette - Jilles Vreeken

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

How do social networks differ across platforms? How do information networks change over time? Answering questions like these requires us to compare two or more graphs. This task is commonly treated as a measurement problem, but numerical answers give limited insight. Here, we argue that if the goal is to gain understanding, we should treat graph similarity assessment as a description problem instead. We formalize this problem as a model selection task using the Minimum Description Length principle, capturing the similarity of the input graphs in a common model and the differences between them in transformations to individual models. To discover good models, we propose Momo, which breaks the problem into two parts and introduces efficient algorithms for each. Through an extensive set of experiments on a wide range of synthetic and real-world graphs, we confirm that Momo works well in practice.

قيم البحث

368 - Xuecheng Liu , Luoyi Fu , Xinbing Wang 2021

The von Neumann graph entropy is a measure of graph complexity based on the Laplacian spectrum. It has recently found applications in various learning tasks driven by networked data. However, it is computational demanding and hard to interpret using simple structural patterns. Due to the close relation between Lapalcian spectrum and degree sequence, we conjecture that the structural information, defined as the Shannon entropy of the normalized degree sequence, might be a good approximation of the von Neumann graph entropy that is both scalable and interpretable. In this work, we thereby study the difference between the structural information and von Neumann graph entropy named as {em entropy gap}. Based on the knowledge that the degree sequence is majorized by the Laplacian spectrum, we for the first time prove the entropy gap is between $0$ and $log_2 e$ in any undirected unweighted graphs. Consequently we certify that the structural information is a good approximation of the von Neumann graph entropy that achieves provable accuracy, scalability, and interpretability simultaneously. This approximation is further applied to two entropy-related tasks: network design and graph similarity measure, where novel graph similarity measure and fast algorithms are proposed. Our experimental results on graphs of various scales and types show that the very small entropy gap readily applies to a wide range of graphs and weighted graphs. As an approximation of the von Neumann graph entropy, the structural information is the only one that achieves both high efficiency and high accuracy among the prominent methods. It is at least two orders of magnitude faster than SLaQ with comparable accuracy. Our structural information based methods also exhibit superior performance in two entropy-related tasks.

الشبكات الاجتماعية والمعلومات نظرية المعلومات نظرية المعلومات

(Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs

149 - Boaz Barak , Chi-Ning Chou , Zhixian Lei 2018

We give a quasipolynomial time algorithm for the graph matching problem (also known as noisy or robust graph isomorphism) on correlated random graphs. Specifically, for every $gamma>0$, we give a $n^{O(log n)}$ time algorithm that given a pair of $ga mma$-correlated $G(n,p)$ graphs $G_0,G_1$ with average degree between $n^{varepsilon}$ and $n^{1/153}$ for $varepsilon = o(1)$, recovers the ground truth permutation $piin S_n$ that matches the vertices of $G_0$ to the vertices of $G_n$ in the way that minimizes the number of mismatched edges. We also give a recovery algorithm for a denser regime, and a polynomial-time algorithm for distinguishing between correlated and uncorrelated graphs. Prior work showed that recovery is information-theoretically possible in this model as long the average degree was at least $log n$, but sub-exponential time algorithms were only known in the dense case (i.e., for $p > n^{-o(1)}$). Moreover, Percolation Graph Matching, which is the most common heuristic for this problem, has been shown to require knowledge of $n^{Omega(1)}$ seeds (i.e., input/output pairs of the permutation $pi$) to succeed in this regime. In contrast our algorithms require no seed and succeed for $p$ which is as low as $n^{o(1)-1}$.

بنى وهياكل البيانات والخوارزميات نظرية المعلومات التعلم الآلي

Turing Number: How Far Are You to A. M. Turing Award?

150 - Feng Xia , Jiaying Liu , Jing Ren 2021

The ACM A.M. Turing Award is commonly acknowledged as the highest distinction in the realm of computer science. Since 1960s, it has been awarded to computer scientists who made outstanding contributions. The significance of this award is far-reaching to the laureates as well as their research teams. However, unlike the Nobel Prize that has been extensively investigated, little research has been done to explore this most important award. To this end, we propose the Turing Number (TN) index to measure how far a specific scholar is to this award. Inspired by previous works on Erdos Number and Bacon Number, this index is defined as the shortest path between a given scholar to any Turing Award Laureate. Experimental results suggest that TN can reflect the closeness of collaboration between scholars and Turing Award Laureates. With the correlation analysis between TN and metrics from the bibliometric-level and network-level, we demonstrate that TN has the potential of reflecting a scholars academic influence and reputation.

الشبكات الاجتماعية والمعلومات المكتبات الرقمية تحليل البيانات والإحصاءات والاحتمال

Minimum Description Length Revisited

67 - Peter Grunwald , Teemu Roos 2019

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was origi nally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective.

المنهجية نظرية المعلومات التعلم الآلي

Why do These Match? Explaining the Behavior of Image Similarity Models

249 - Bryan A. Plummer , Mariya I. Vasileva , Vitali Petsiuk 2019

Explaining a deep learning model can help users understand its behavior and allow researchers to discern its shortcomings. Recent work has primarily focused on explaining models for tasks like image classification or visual question answering. In thi s paper, we introduce Salient Attributes for Network Explanation (SANE) to explain image similarity models, where a models output is a score measuring the similarity of two inputs rather than a classification score. In this task, an explanation depends on both of the input images, so standard methods do not apply. Our SANE explanations pairs a saliency map identifying important image regions with an attribute that best explains the match. We find that our explanations provide additional information not typically captured by saliency maps alone, and can also improve performance on the classic task of attribute recognition. Our approachs ability to generalize is demonstrated on two datasets from diverse domains, Polyvore Outfits and Animals with Attributes 2. Code available at: https://github.com/VisionLearningGroup/SANE

الرؤية الحاسوبية وتمييز الأنماط