بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Counting Triangles in Massive Graphs with MapReduce

343 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tamara Kolda

تاريخ النشر 2013

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Tamara G. Kolda - Ali Pinar - Todd Plantenga

الشبكات الاجتماعية والمعلومات النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are triangle-based and give a measure of the connectedness of mutual friends. This is often summarized in terms of clustering coefficients, which measure the likelihood that two neighbors of a node are themselves connected. Computing these measures exactly for large-scale networks is prohibitively expensive in both memory and time. However, a recent wedge sampling algorithm has proved successful in efficiently and accurately estimating clustering coefficients. In this paper, we describe how to implement this approach in MapReduce to deal with massive graphs. We show results on publicly-available networks, the largest of which is 132M nodes and 4.7B edges, as well as artificially generated networks (using the Graph500 benchmark), the largest of which has 240M nodes and 8.5B edges. We can estimate the clustering coefficient by degree bin (e.g., we use exponential binning) and the number of triangles per bin, as well as the global clustering coefficient and total number of triangles, in an average of 0.33 seconds per million edges plus overhead (approximately 225 seconds total for our configuration). The technique can also be used to study triangle statistics such as the ratio of the highest and lowest degree, and we highlight differences between social and non-social networks. To the best of our knowledge, these are the largest triangle-based graph computations published to date.

قيم البحث

اقرأ أيضاً

Social content matching in MapReduce

268 - Gianmarco De Francisci Morales 2011

Matching problems are ubiquitous. They occur in economic markets, labor markets, internet advertising, and elsewhere. In this paper we focus on an application of matching for social media. Our goal is to distribute content from information suppliers to information consumers. We seek to maximize the overall relevance of the matched content from suppliers to consumers while regulating the overall activity, e.g., ensuring that no consumer is overwhelmed with data and that all suppliers have chances to deliver their content. We propose two matching algorithms, GreedyMR and StackMR, geared for the MapReduce paradigm. Both algorithms have provable approximation guarantees, and in practice they produce high-quality solutions. While both algorithms scale extremely well, we can show that StackMR requires only a poly-logarithmic number of MapReduce steps, making it an attractive option for applications with very large datasets. We experimentally show the trade-offs between quality and efficiency of our solutions on two large datasets coming from real-world social-media web sites.

الشبكات الاجتماعية والمعلومات النظم الموزعة والتوازية والحوسبة العنقودية

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

395 - Ethan R. Elenberg , Karthikeyan Shanmugam , Michael Borokhovich 2015

We study the problem of approximating the $3$-profile of a large graph. $3$-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel c oncept of $3$-profile sparsifiers: sparse graphs that can be used to approximate the full $3$-profile counts for a given large graph. Further, we study the problem of estimating local and ego $3$-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect $2$-hop information without maintaining an explicit $2$-hop neighborhood list at each vertex. This enables the computation of all the local $3$-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to $640$ cores on Amazon EC2. We find that our algorithm can estimate the $3$-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego $3$-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.

الشبكات الاجتماعية والمعلومات النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

Counting graph orientations with no directed triangles

181 - Pedro Araujo , Fabio Botler , Guilherme Oliveira Mota 2020

Alon and Yuster proved that the number of orientations of any $n$-vertex graph in which every $K_3$ is transitively oriented is at most $2^{lfloor n^2/4rfloor}$ for $n geq 10^4$ and conjectured that the precise lower bound on $n$ should be $n geq 8$. We confirm their conjecture and, additionally, characterize the extremal families by showing that the balanced complete bipartite graph with $n$ vertices is the only $n$-vertex graph for which there are exactly $2^{lfloor n^2/4rfloor}$ such orientations.

التوافقية الرياضيات المتقطعة

Gene expression modelling across multiple cell-lines with MapReduce

573 - David M. Budden , Edmund J. Crampin 2015

With the wealth of high-throughput sequencing data generated by recent large-scale consortia, predictive gene expression modelling has become an important tool for integrative analysis of transcriptomic and epigenetic data. However, sequencing data-s ets are characteristically large, and previously modelling frameworks are typically inefficient and unable to leverage multi-core or distributed processing architectures. In this study, we detail an efficient and parallelised MapReduce implementation of gene expression modelling. We leverage the computational efficiency of this framework to provide an integrative analysis of over fifty histone modification data-sets across a variety of cancerous and non-cancerous cell-lines. Our results demonstrate that the genome-wide relationships between histone modifications and mRNA transcription are lineage, tissue and karyotype-invariant, and that models trained on matched epigenetic/transcriptomic data from non-cancerous cell-lines are able to predict cancerous expression with equivalent genome-wide fidelity.

الأساليب الكمية النظم الموزعة والتوازية والحوسبة العنقودية الجينوم

Eigenvalues and triangles in graphs

182 - Huiqiu Lin , Bo Ning , Baoyindureng Wu 2019

Bollobas and Nikiforov [J. Combin. Theory, Ser. B. 97 (2007) 859--865] conjectured the following. If $G$ is a $K_{r+1}$-free graph on at least $r+1$ vertices and $m$ edges, then $lambda^2_1(G)+lambda^2_2(G)leq frac{r-1}{r}cdot2m$, where $lambda_1(G)$ and $lambda_2(G)$ are the largest and the second largest eigenvalues of the adjacency matrix $A(G)$, respectively. In this paper, we confirm the conjecture in the case $r=2$, by using tools from doubly stochastic matrix theory, and also characterize all families of extremal graphs. Motivated by classic theorems due to ErdH{o}s and Nosal respectively, we prove that every non-bipartite graph $G$ of order $n$ and size $m$ contains a triangle, if one of the following is true: (1) $lambda_1(G)geqsqrt{m-1}$ and $G eq C_5cup (n-5)K_1$; and (2) $lambda_1(G)geq lambda_1(S(K_{lfloorfrac{n-1}{2}rfloor,lceilfrac{n-1}{2}rceil}))$ and $G eq S(K_{lfloorfrac{n-1}{2}rfloor,lceilfrac{n-1}{2}rceil})$, where $S(K_{lfloorfrac{n-1}{2}rfloor,lceilfrac{n-1}{2}rceil})$ is obtained from $K_{lfloorfrac{n-1}{2}rfloor,lceilfrac{n-1}{2}rceil}$ by subdividing an edge. Both conditions are best possible. We conclude this paper with some open problems.

التوافقية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الأندلس للعلوم الطبية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Counting Triangles in Massive Graphs with MapReduce

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً