No Arabic abstract
The problem of attaining energy efficiency in distributed systems is of importance, but a general, non-domain-specific theory of energy-minimal scheduling is far from developed. In this paper, we classify the problems of energy-minimal scheduling and present theoretical foundations of the same. We derive results concerning energy-minimal scheduling of independent jobs in a distributed system with functionally similar machines with different working and idle power ratings. The machines considered in our system can have identical as well as different speeds. If the jobs can be divided into arbitrary parts, we show that the minimum-energy schedule can be generated in linear time and give exact scheduling algorithms. For the cases where jobs are non-divisible, we prove that the scheduling problems are NP-hard and also give approximation algorithms for the same along with their bounds.
The $k$-truss, introduced by Cohen (2005), is a graph where every edge is incident to at least $k$ triangles. This is a relaxation of the clique. It has proved to be a useful tool in identifying cohesive subnetworks in a variety of real-world graphs. Despite its simplicity and its utility, the combinatorial and algorithmic aspects of trusses have not been thoroughly explored. We provide nearly-tight bounds on the edge counts of $k$-trusses. We also give two improved algorithms for finding trusses in large-scale graphs. First, we present a simplified and faster algorithm, based on approach discussed in Wang & Cheng (2012). Second, we present a theoretical algorithm based on fast matrix multiplication; this converts a triangle-generation algorithm of Bjorklund et al. (2014) into a dynamic data structure.
This deliverable reports the results of the power models, energy models and libraries for energy-efficient concurrent data structures and algorithms as available by project month 30 of Work Package 2 (WP2). It reports i) the latest results of Task 2.2-2.4 on providing programming abstractions and libraries for developing energy-efficient data structures and algorithms and ii) the improved results of Task 2.1 on investigating and modeling the trade-off between energy and performance of concurrent data structures and algorithms. The work has been conducted on two main EXCESS platforms: Intel platforms with recent Intel multicore CPUs and Movidius Myriad platforms.
Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads unique attributes: ML jobs have long-running tasks that need to be gang-scheduled, and their performance is sensitive to tasks relative placement. We propose Themis, a new scheduling framework for ML training workloads. Its GPU allocation policy enforces that ML workloads complete in a finish-time fair manner, a new notion we introduce. To capture placement sensitivity and ensure efficiency, Themis uses a two-level scheduling architecture where ML workloads bid on available resources that are offered in an auction run by a central arbiter. Our auction design allocates GPUs to winning bids by trading off efficiency for fairness in the short term but ensuring finish-time fairness in the long term. Our evaluation on a production trace shows that Themis can improve fairness by more than 2.25X and is ~5% to 250% more cluster efficient in comparison to state-of-the-art schedulers.
The coflow scheduling problem has emerged as a popular abstraction in the last few years to study data communication problems within a data center. In this basic framework, each coflow has a set of communication demands and the goal is to schedule many coflows in a manner that minimizes the total weighted completion time. A coflow is said to complete when all its communication needs are met. This problem has been extremely well studied for the case of complete bipartite graphs that model a data center with full bisection bandwidth and several approximation algorithms and effective heuristics have been proposed recently. In this work, we study a slightly different model of coflow scheduling in general graphs (to capture traffic between data centers) and develop practical and efficient approximation algorithms for it. Our main result is a randomized 2 approximation algorithm for the single path and free path model, significantly improving prior work. In addition, we demonstrate via extensive experiments that the algorithm is practical, easy to implement and performs well in practice.
This paper presents improved approximation algorithms for the problem of multiprocessor scheduling under uncertainty, or SUU, in which the execution of each job may fail probabilistically. This problem is motivated by the increasing use of distributed computing to handle large, computationally intensive tasks. In the SUU problem we are given n unit-length jobs and m machines, a directed acyclic graph G of precedence constraints among jobs, and unrelated failure probabilities q_{ij} for each job j when executed on machine i for a single timestep. Our goal is to find a schedule that minimizes the expected makespan, which is the expected time at which all jobs complete. Lin and Rajaraman gave the first approximations for this NP-hard problem for the special cases of independent jobs, precedence constraints forming disjoint chains, and precedence constraints forming trees. In this paper, we present asymptotically better approximation algorithms. In particular, we give an O(loglog min(m,n))-approximation for independent jobs (improving on the previously best O(log n)-approximation). We also give an O(log(n+m) loglog min(m,n))-approximation algorithm for precedence constraints that form disjoint chains (improving on the previously best O(log(n)log(m)log(n+m)/loglog(n+m))-approximation by a (log n/loglog n)^2 factor when n = poly(m). Our algorithm for precedence constraints forming chains can also be used as a component for precedence constraints forming trees, yielding a similar improvement over the previously best algorithms for trees.