A Fast, Scalable, Universal Approach For Distributed Data Aggregations

400 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Niranda Perera

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Niranda Perera - Vibhatha Abeykoon - Chathura Widanage

النظم الموزعة والتوازية والحوسبة العنقودية استرجاع المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these applications. They are traditionally aimed at generating meaningful information on large data-sets, and today, they are being used for engineering more effective features for complex AI models. Aggregations are usually carried out on top of data abstractions such as tables/ arrays and are combined with other operations such as grouping of values. There are frameworks that excel in the said domains individually. But, we believe that there is an essential requirement for a data analytics tool that can universally integrate with existing frameworks, and thereby increase the productivity and efficiency of the entire data analytics pipeline. Cylon endeavors to fulfill this void. In this paper, we present Cylons fast and scalable aggregation operations implemented on top of a distributed in-memory table structure that universally integrates with existing frameworks.

قيم البحث

120 - Peng Sun , Yonggang Wen , Ta Nguyen Binh Duong 2016

In large-scale distributed file systems, efficient meta- data operations are critical since most file operations have to interact with metadata servers first. In existing distributed hash table (DHT) based metadata management systems, the lookup serv ice could be a performance bottleneck due to its significant CPU overhead. Our investigations showed that the lookup service could reduce system throughput by up to 70%, and increase system latency by a factor of up to 8 compared to ideal scenarios. In this paper, we present MetaFlow, a scalable metadata lookup service utilizing software-defined networking (SDN) techniques to distribute lookup workload over network components. MetaFlow tackles the lookup bottleneck problem by leveraging B-tree, which is constructed over the physical topology, to manage flow tables for SDN-enabled switches. Therefore, metadata requests can be forwarded to appropriate servers using only switches. Extensive performance evaluations in both simulations and testbed showed that MetaFlow increases system throughput by a factor of up to 3.2, and reduce system latency by a factor of up to 5 compared to DHT-based systems. We also deployed MetaFlow in a distributed file system, and demonstrated significant performance improvement.

النظم الموزعة والتوازية والحوسبة العنقودية

Designing a scalable framework for declarative automation on distributed systems

69 - J. Lowell Wofford 2021

As distributed systems grow in scale and complexity, the need for flexible automation of systems management functions also grows. We outline a framework for building tools that provide distributed, scalable, declarative, modular, and continuous autom ation for distributed systems. We focus on four points of design: 1) a state-management approach that prescribes source-of-truth for configured and discovered system states; 2) a technique to solve the declarative unification problem for a class of automation problems, providing state convergence and modularity; 3) an eventual-consistency approach to state synchronization which provides automation at scale; 4) an event-driven architecture that provides always-on state enforcement. We describe the methodology, software architecture for the framework, and constraints for these techniques to apply to an automation problem. We overview a reference application built on this framework that provides state-aware system provisioning and node lifecycle management, highlighting key advantages. We conclude with a discussion of current and future applications.

النظم الموزعة والتوازية والحوسبة العنقودية

RapidRAID: Pipelined Erasure Codes for Fast Data Archival in Distributed Storage Systems

145 - Lluis Pamies-Juarez , Anwitaman Datta , Frederique Oggier 2012

To achieve reliability in distributed storage systems, data has usually been replicated across different nodes. However the increasing volume of data to be stored has motivated the introduction of erasure codes, a storage efficient alternative to rep lication, particularly suited for archival in data centers, where old datasets (rarely accessed) can be erasure encoded, while replicas are maintained only for the latest data. Many recent works consider the design of new storage-centric erasure codes for improved repairability. In contrast, this paper addresses the migration from replication to encoding: traditionally erasure coding is an atomic operation in that a single node with the whole object encodes and uploads all the encoded pieces. Although large datasets can be concurrently archived by distributing individual object encodings among different nodes, the network and computing capacity of individual nodes constrain the archival process due to such atomicity. We propose a new pipelined coding strategy that distributes the network and computing load of single-object encodings among different nodes, which also speeds up multiple object archival. We further present RapidRAID codes, an explicit family of pipelined erasure codes which provides fast archival without compromising either data reliability or storage overheads. Finally, we provide a real implementation of RapidRAID codes and benchmark its performance using both a cluster of 50 nodes and a set of Amazon EC2 instances. Experiments show that RapidRAID codes reduce a single objects coding time by up to 90%, while when multiple objects are encoded concurrently, the reduction is up to 20%.

النظم الموزعة والتوازية والحوسبة العنقودية

EZLDA: Efficient and Scalable LDA on GPUs

67 - Shilong Wang 2020

LDA is a statistical approach for topic modeling with a wide range of applications. However, there exist very few attempts to accelerate LDA on GPUs which come with exceptional computing and memory throughput capabilities. To this end, we introduce E ZLDA which achieves efficient and scalable LDA training on GPUs with the following three contributions: First, EZLDA introduces three-branch sampling method which takes advantage of the convergence heterogeneity of various tokens to reduce the redundant sampling task. Second, to enable sparsity-aware format for both D and W on GPUs with fast sampling and updating, we introduce hybrid format for W along with corresponding token partition to T and inverted index designs. Third, we design a hierarchical workload balancing solution to address the extremely skewed workload imbalance problem on GPU and scaleEZLDA across multiple GPUs. Taken together, EZLDA achieves superior performance over the state-of-the-art attempts with lower memory consumption.

النظم الموزعة والتوازية والحوسبة العنقودية استرجاع المعلومات التعلم الآلي

Scalable and Secure Aggregation in Distributed Networks

138 - Sebastien Gambs , Rachid Guerraoui , Hamza Harkous 2011

We consider the problem of computing an aggregation function in a emph{secure} and emph{scalable} way. Whereas previous distributed solutions with similar security guarantees have a communication cost of $O(n^3)$, we present a distributed protocol th at requires only a communication complexity of $O(nlog^3 n)$, which we prove is near-optimal. Our protocol ensures perfect security against a computationally-bounded adversary, tolerates $(1/2-epsilon)n$ malicious nodes for any constant $1/2 > epsilon > 0$ (not depending on $n$), and outputs the exact value of the aggregated function with high probability.

النظم الموزعة والتوازية والحوسبة العنقودية التعقيد الحسابي التشفير والأمن

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حماه

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Fast, Scalable, Universal Approach For Distributed Data Aggregations

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً