Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts


Abstract in English

Distributed dataflow systems enable the use of clusters for scalable data analytics. However, selecting appropriate cluster resources for a processing job is often not straightforward. Performance models trained on historical executions of a concrete job are helpful in such situations, yet they are usually bound to a specific job execution context (e.g. node type, softwar

Download