No Arabic abstract
This paper introduces a new resource allocation problem in distributed computing called distributed serving with mobile servers (DSMS). In DSMS, there are $k$ identical mobile servers residing at the processors of a network. At arbitrary points of time, any subset of processors can invoke one or more requests. To serve a request, one of the servers must move to the processor that invoked the request. Resource allocation is performed in a distributed manner since only the processor that invoked the request initially knows about it. All processors cooperate by passing messages to achieve correct resource allocation. They do this with the goal to minimize the communication cost. Routing servers in large-scale distributed systems requires a scalable location service. We introduce the distributed protocol GNN that solves the DSMS problem on overlay trees. We prove that GNN is starvation-free and correctly integrates locating the servers and synchronizing the concurrent access to servers despite asynchrony, even when the requests are invoked over time. Further, we analyze GNN for one-shot executions, i.e., all requests are invoked simultaneously. We prove that when running GNN on top of a special family of tree topologies---known as hierarchically well-separated trees (HSTs)---we obtain a randomized distributed protocol with an expected competitive ratio of $O(log n)$ on general network topologies with $n$ processors. From a technical point of view, our main result is that GNN optimally solves the DSMS problem on HSTs for one-shot executions, even if communication is asynchronous. Further, we present a lower bound of $Omega(max{k, log n/loglog n})$ on the competitive ratio for DSMS. The lower bound even holds when communication is synchronous and requests are invoked sequentially.
Cloud GPU servers have become the de facto way for deep learning practitioners to train complex models on large-scale datasets. However, it is challenging to determine the appropriate cluster configuration---e.g., server type and number---for different training workloads while balancing the trade-offs in training time, cost, and model accuracy. Adding to the complexity is the potential to reduce the monetary cost by using cheaper, but revocable, transient GPU servers. In this work, we analyze distributed training performance under diverse cluster configurations using CM-DARE, a cloud-based measurement and training framework. Our empirical datasets include measurements from three GPU types, six geographic regions, twenty convolutional neural networks, and thousands of Google Cloud servers. We also demonstrate the feasibility of predicting training speed and overhead using regression-based models. Finally, we discuss potential use cases of our performance modeling such as detecting and mitigating performance bottlenecks.
Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.
Advances in mobile computing have paved the way for new types of distributed applications that can be executed solely by mobile devices on device-to-device (D2D) ecosystems (e.g., crowdsensing). Sophisticated applications, like cryptocurrencies, need distributed ledgers to function. Distributed ledgers, such as blockchains and directed acyclic graphs (DAGs), employ consensus protocols to add data in the form of blocks. However, such protocols are designed for resourceful devices that are interconnected via the Internet. Moreover, existing distributed ledgers are not deployable to D2D ecosystems since their storage needs are continuously increasing. In this work, we introduce and analyse Mneme, a DAG-based distributed ledger that can be maintained solely by mobile devices. Mneme utilizes two novel consensus protocols: Proof-of-Context (PoC) and Proof-of-Equivalence (PoE). PoC employs users context to add data on Mneme. PoE is executed periodically to summarize data and produce equivalent blocks that require less storage. We analyze Mnemes security and justify the ability of PoC and PoE to guarantee the characteristics of distributed ledgers: persistence and liveness. Furthermore, we analyze potential attacks from malicious users and prove that the probability of a successful attack is inversely proportional to the square of the number of mobile users who maintain Mneme.
We present a fully lock-free variant of the recent Montage system for persistent data structures. Our variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is buffered durably linearizable: it guarantees that the state recovered in the wake of a crash will represent a consistent prefix of pre-crash execution. Unlike its predecessor, nbMontage ensures wait-free progress of the persistence frontier, thereby bounding the number of recent updates that may be lost on a crash, and allowing a thread to force an update of the frontier (i.e., to perform a sync operation) without the risk of blocking. As an extra benefit, the helping mechanism employed by our wait-free sync significantly reduces its latency. Performance results for nonblocking queues, skip lists, trees, and hash tables rival custom data structures in the literature -- dramatically faster than achieved with prior general-purpose systems, and generally within 50% of equivalent non-persistent structures placed in DRAM.
We show that the $(degree+1)$-list coloring problem can be solved deterministically in $O(D cdot log n cdotlog^2Delta)$ rounds in the CONGEST model, where $D$ is the diameter of the graph, $n$ the number of nodes, and $Delta$ the maximum degree. Using the recent polylogarithmic-time deterministic network decomposition algorithm by Rozhov{n} and Ghaffari [STOC 2020], this implies the first efficient (i.e., $polylog n$-time) deterministic CONGEST algorithm for the $(Delta+1)$-coloring and the $(mathit{degree}+1)$-list coloring problem. Previously the best known algorithm required $2^{O(sqrt{log n})}$ rounds and was not based on network decompositions. Our techniques also lead to deterministic $(mathit{degree}+1)$-list coloring algorithms for the congested clique and the massively parallel computation (MPC) model. For the congested clique, we obtain an algorithm with time complexity $O(logDeltacdotloglogDelta)$, for the MPC model, we obtain algorithms with round complexity $O(log^2Delta)$ for the linear-memory regime and $O(log^2Delta + log n)$ for the sublinear memory regime.