ﻻ يوجد ملخص باللغة العربية
In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these applications. They are traditionally aimed at generating meaningful information on large data-sets, and today, they are being used for engineering more effective features for complex AI models. Aggregations are usually carried out on top of data abstractions such as tables/ arrays and are combined with other operations such as grouping of values. There are frameworks that excel in the said domains individually. But, we believe that there is an essential requirement for a data analytics tool that can universally integrate with existing frameworks, and thereby increase the productivity and efficiency of the entire data analytics pipeline. Cylon endeavors to fulfill this void. In this paper, we present Cylons fast and scalable aggregation operations implemented on top of a distributed in-memory table structure that universally integrates with existing frameworks.
In large-scale distributed file systems, efficient meta- data operations are critical since most file operations have to interact with metadata servers first. In existing distributed hash table (DHT) based metadata management systems, the lookup serv
As distributed systems grow in scale and complexity, the need for flexible automation of systems management functions also grows. We outline a framework for building tools that provide distributed, scalable, declarative, modular, and continuous autom
To achieve reliability in distributed storage systems, data has usually been replicated across different nodes. However the increasing volume of data to be stored has motivated the introduction of erasure codes, a storage efficient alternative to rep
LDA is a statistical approach for topic modeling with a wide range of applications. However, there exist very few attempts to accelerate LDA on GPUs which come with exceptional computing and memory throughput capabilities. To this end, we introduce E
We consider the problem of computing an aggregation function in a emph{secure} and emph{scalable} way. Whereas previous distributed solutions with similar security guarantees have a communication cost of $O(n^3)$, we present a distributed protocol th