ﻻ يوجد ملخص باللغة العربية
Collective communication algorithms are an important component of distributed computation. Indeed, in the case of deep-learning, collective communication is the Amdahls bottleneck of data-parallel training. This paper introduces SCCL (for Synthesized Collective Communication Library), a systematic approach to synthesize collective communication algorithms that are explicitly tailored to a particular hardware topology. SCCL synthesizes algorithms along the Pareto-frontier spanning from latency-optimal to bandwidth-optimal implementations of a collective. The paper demonstrates how to encode SCCLs synthesis as a quantifier-free SMT formula which can be discharged to a theorem prover. We further demonstrate how to scale our synthesis by exploiting symmetries in topologies and collectives. We synthesize and introduce novel latency and bandwidth optimal algorithms not seen in the literature on two popular hardware topologies. We also show how SCCL efficiently lowers algorithms to implementations on two hardware architectures (NVIDIA and AMD) and demonstrate competitive performance with hand optimized collective communication libraries.
The performance of collective operations has been a critical issue since the advent of MPI. Many algorithms have been proposed for each MPI collective operation but none of them proved optimal in all situations. Different algorithms demonstrate super
Given a boolean predicate $Pi$ on labeled networks (e.g., proper coloring, leader election, etc.), a self-stabilizing algorithm for $Pi$ is a distributed algorithm that can start from any initial configuration of the network (i.e., every node has an
Observer design typically requires the observability of the underlying system, which may be hard to verify for nonlinear systems, while guaranteeing asymptotic convergence of errors, which may be insufficient in order to satisfy performance condition
Collective intelligence is the ability of a group to perform more effectively than any individual alone. Diversity among group members is a key condition for the emergence of collective intelligence, but maintaining diversity is challenging in the fa
Gradecast is a simple three-round algorithm presented by Feldman and Micali. The current work presents a very simple algorithm that utilized Gradecast to achieve Byzantine agreement. Two small variations of the presented algorithm lead to improved al