ﻻ يوجد ملخص باللغة العربية
Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time. They are an essential part of many data-intensive applications and analytics platforms. The rate at which events arrive at DSP systems can vary considerably over time, which may be due to trends, cyclic, and seasonal patterns within the data streams. A priori knowledge of incoming workloads enables proactive approaches to resource management and optimization tasks such as dynamic scaling, live migration of resources, and the tuning of configuration parameters during run-times, thus leading to a potentially better Quality of Service. In this paper we conduct a comprehensive evaluation of different load prediction techniques for DSP jobs. We identify three use-cases and formulate requirements for making load predictions specific to DSP jobs. Automatically optimized classical and Deep Learning methods are being evaluated on nine different datasets from typical DSP domains, i.e. the IoT, Web 2.0, and cluster monitoring. We compare model performance with respect to overall accuracy and training duration. Our results show that the Deep Learning methods provide the most accurate load predictions for the majority of the evaluated datasets.
We study the problem of load balancing in distributed stream processing engines, which is exacerbated in the presence of skew. We introduce Partial Key Grouping (PKG), a new stream partitioning scheme that adapts the classical power of two choices to
Stream applications are widely deployed on the cloud. While modern distributed streaming systems like Flink and Spark Streaming can schedule and execute them efficiently, streaming dataflows are often dynamically changing, which may cause computation
Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these
Fine tuning distributed systems is considered to be a craftsmanship, relying on intuition and experience. This becomes even more challenging when the systems need to react in near real time, as streaming engines have to do to maintain pre-agreed serv
We introduce a new graph problem, the token dropping game, and we show how to solve it efficiently in a distributed setting. We use the token dropping game as a tool to design an efficient distributed algorithm for stable orientations and more genera