Trevor: Automatic configuration and scaling of stream processing pipelines

273 0 0.0 ( 0 )

Download Cite

Added by Eyal Cidon

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Manu Bansal - Eyal Cidon - Arjun Balasingam

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Operating a distributed data stream processing workload efficiently at scale is hard. The operator of the workload must parallelize and lay out tasks of the workload with resources that match the requirement of target data rate. The challenge is that neither the operator nor the programmer is typically aware of the scaling behavior of the workload as a function of resources. An operator manually searches for a safe operating point that can handle predicted peak load and deploys with ample headroom for absorbing unpredictable spikes. Such empirical, static over-provisioning is wasteful of both compute and human resources. We show that precise performance models can be automatically learned for distributed stream processing systems that can predict the execution performance of a job even before deployment. Further, those models can be used to optimally schedule logically specified jobs onto available physical hardware. Finally, those models and the derived execution schedules can be refined online to dynamically adapt to unpredictable changes in the runtime environment or auto-scale with variations in job load.

rate research

Building Analytics Pipelines for Querying Big Streams and Data Histories with H-STREAM

129 - Genoveva Vargas-Solar , Javier A. Espinosa-Oviedo 2021

This paper introduces H-STREAM, a big stream/data processing pipelines evaluation engine that proposes stream processing operators as micro-services to support the analysis and visualisation of Big Data streams stemming from IoT (Internet of Things) environments. H-STREAM micro-services combine stream processing and data storage techniques tuned depending on the number of things producing streams, the pace at which they produce them, and the physical computing resources available for processing them online and delivering them to consumers. H-STREAM delivers stream processing and visualisation micro-services installed in a cloud environment. Micro-services can be composed for implementing specific stream aggregation analysis pipelines as queries. The paper presents an experimental validation using Microsoft Azure as a deployment environment for testing the capacity of H-STREAM for dealing with velocity and volume challenges in an (i) a neuroscience experiment and (in) a social connectivity analysis scenario running on IoT farms.

Distributed Parallel and Cluster Computing

Evaluation of Load Prediction Techniques for Distributed Stream Processing

177 - Kordian Gontarska , Morgan Geldenhuys , Dominik Scheinert 2021

Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time. They are an essential part of many data-intensive applications and analytics platforms. The rate at which events arrive at DSP systems can vary considerably over time, which may be due to trends, cyclic, and seasonal patterns within the data streams. A priori knowledge of incoming workloads enables proactive approaches to resource management and optimization tasks such as dynamic scaling, live migration of resources, and the tuning of configuration parameters during run-times, thus leading to a potentially better Quality of Service. In this paper we conduct a comprehensive evaluation of different load prediction techniques for DSP jobs. We identify three use-cases and formulate requirements for making load predictions specific to DSP jobs. Automatically optimized classical and Deep Learning methods are being evaluated on nine different datasets from typical DSP domains, i.e. the IoT, Web 2.0, and cluster monitoring. We compare model performance with respect to overall accuracy and training duration. Our results show that the Deep Learning methods provide the most accurate load predictions for the majority of the evaluated datasets.

Distributed Parallel and Cluster Computing Artificial Intelligence

Dependable IoT Data Stream Processing for Monitoring and Control of Urban Infrastructures

297 - Morgan K. Geldenhuys , Jonathan Will , Benjamin J. J. Pfister 2021

The Internet of Things describes a network of physical devices interacting and producing vast streams of sensor data. At present there are a number of general challenges which exist while developing solutions for use cases involving the monitoring and control of urban infrastructures. These include the need for a dependable method for extracting value from these high volume streams of time sensitive data which is adaptive to changing workloads. Low-latency access to the current state for live monitoring is a necessity as well as the ability to perform queries on historical data. At the same time, many design choices need to be made and the number of possible technology options available further adds to the complexity. In this paper we present a dependable IoT data processing platform for the monitoring and control of urban infrastructures. We define requirements in terms of dependability and then select a number of mature open-source technologies to match these requirements. We examine the disparate parts necessary for delivering a holistic overall architecture and describe the dataflows between each of these components. We likewise present generalizable methods for the enrichment and analysis of sensor data applicable across various application areas. We demonstrate the usefulness of this approach by providing an exemplary prototype platform executing on top of Kubernetes and evaluate the effectiveness of jobs processing sensor data in this environment.

Distributed Parallel and Cluster Computing

ACTS in Need: Automatic Configuration Tuning with Scalability Guarantees

113 - Yuqing Zhu , Jianxun Liu , Mengying Guo 2017

To support the variety of Big Data use cases, many Big Data related systems expose a large number of user-specifiable configuration parameters. Highlighted in our experiments, a MySQL deployment with well-tuned configuration parameters achieves a peak throughput as 12 times much as one with the default setting. However, finding the best setting for the tens or hundreds of configuration parameters is mission impossible for ordinary users. Worse still, many Big Data applications require the support of multiple systems co-deployed in the same cluster. As these co-deployed systems can interact to affect the overall performance, they must be tuned together. Automatic configuration tuning with scalability guarantees (ACTS) is in need to help system users. Solutions to ACTS must scale to various systems, workloads, deployments, parameters and resource limits. Proposing and implementing an ACTS solution, we demonstrate that ACTS can benefit users not only in improving system performance and resource utilization, but also in saving costs and enabling fairer benchmarking.

Distributed Parallel and Cluster Computing

Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning

76 - Luis M. Vaquero , Felix Cuadrado 2018

Fine tuning distributed systems is considered to be a craftsmanship, relying on intuition and experience. This becomes even more challenging when the systems need to react in near real time, as streaming engines have to do to maintain pre-agreed service quality metrics. In this article, we present an automated approach that builds on a combination of supervised and reinforcement learning methods to recommend the most appropriate lever configurations based on previous load. With this, streaming engines can be automatically tuned without requiring a human to determine the right way and proper time to deploy them. This opens the door to new configurations that are not being applied today since the complexity of managing these systems has surpassed the abilities of human experts. We show how reinforcement learning systems can find substantially better configurations in less time than their human counterparts and adapt to changing workloads.

Distributed Parallel and Cluster Computing Databases Machine Learning