No Arabic abstract
In this work we refine the analysis of the distributed Laplacian solver recently established by Forster, Goranci, Liu, Peng, Sun, and Ye (FOCS 21), via the Ghaffari-Haeupler framework (SODA 16) of low-congestion shortcuts. Specifically, if $epsilon > 0$ represents the error of the solver, we derive two main results. First, for any $n$-node graph $G$ with hop-diameter $D$ and treewidth bounded by $k$, we show the existence of a Laplacian solver with round complexity $O(n^{o(1)}kD log(1/epsilon))$ in the CONGEST model. For graphs with bounded treewidth this circumvents the notorious $Omega(sqrt{n})$ lower bound for global problems in general graphs. Moreover, following a recent line of work in distributed algorithms, we consider a hybrid communication model which enhances CONGEST with very limited global power in the form of the recently introduced node-capacitated clique. In this model, we show the existence of a Laplacian solver with round complexity $O(n^{o(1)} log(1/epsilon))$. The unifying thread of these results is an application of accelerated distributed algorithms for a congested variant of the standard part-wise aggregation problem that we introduce. This primitive constitutes the primary building block for simulating local operations on low-congestion minors, and we believe that this framework could be more generally applicable.
Existing distributed machine learning (DML) systems focus on improving the computational efficiency of distributed learning, whereas communication aspects have received less attention. Many DML systems treat the network as a blackbox. Thus, DML algorithms performance is impeded by network bottlenecks, and DML systems end up sacrificing important algorithmic and system-level benefits. We present MLfabric, a communication library that manages all network transfers in a DML system, and holistically determines the communication pattern of a DML algorithm at any point in time. This allows MLfabric to carefully order transfers (i.e., gradient updates) to improve convergence, opportunistically aggregate updates in-network to improve efficiency, and proactively replicate some of them to support new notions of fault tolerance. We empirically find that MLfabric achieves up to 3X speed-up in training large deep learning models in realistic dynamic cluster settings.
Ground-state cooling of mechanical resonators is an important task in quantum optomechanics, because it is a necessary prerequisite for creation, manipulation, and application of macroscopic mechanical coherence. Here, we propose a transient-state scheme to accelerate ground-state cooling of a mechanical resonator in a three-mode loop-coupled optomechanical system via shortcuts to adiabaticity (STA). We consider four kinds of coupling protocols and calculate the evolution of the mean phonon number of the mechanical resonator in both the adiabatic and STA cases. We verify that the ground-state cooling of the mechanical resonator can be achieved with the STA method in a much shorter period. The STA method can also be generalized to accelerate other adiabatic processes in cavity optomechanics, and hence this work will open up a new realm of fast optomechanical manipulations.
Maximum weight matching is one of the most fundamental combinatorial optimization problems with a wide range of applications in data mining and bioinformatics. Developing distributed weighted matching algorithms is challenging due to the sequential nature of efficient algorithms for this problem. In this paper, we develop a simple distributed algorithm for the problem on general graphs with approximation guarantee of $2+varepsilon$ that (nearly) matches that of the sequential greedy algorithm. A key advantage of this algorithm is that it can be easily implemented in only two rounds of computation in modern parallel computation frameworks such as MapReduce. We also demonstrate the efficiency of our algorithm in practice on various graphs (some with half a trillion edges) by achieving objective values always close to what is achievable in the centralized setting.
Finding or monitoring subgraph instances that are isomorphic to a given pattern graph in a data graph is a fundamental query operation in many graph analytic applications, such as network motif mining and fraud detection. The state-of-the-art distributed methods are inefficient in communication. They have to shuffle partial matching results during the distributed multiway join. The partial matching results may be much larger than the data graph itself. To overcome the drawback, we develop the Batch-BENU framework (B-BENU) for distributed subgraph enumeration. B-BENU executes a group of local search tasks in parallel. Each task enumerates subgraphs around a vertex in the data graph, guided by a backtracking-based execution plan. B-BENU does not shuffle any partial matching result. Instead, it stores the data graph in a distributed database. Each task queries adjacency sets of the data graph on demand. To support dynamic data graphs, we propose the concept of incremental pattern graphs and turn continuous subgraph enumeration into enumerating incremental pattern graphs at each time step. We develop the Streaming-BENU framework (S-BENU) to enumerate their matches efficiently. We implement B-BENU and S-BENU with the local database cache and the task splitting techniques. The extensive experiments show that B-BENU and S-BENU can scale to big data graphs and complex pattern graphs. They outperform the state-of-the-art methods by up to one and two orders of magnitude, respectively.
We use queueing networks to present a new approach to solving Laplacian systems. This marks a significant departure from the existing techniques, mostly based on graph-theoretic constructions and sampling. Our distributed solver works for a large and important class of Laplacian systems that we call one-sink Laplacian systems. Specifically, our solver can produce solutions for systems of the form $Lx = b$ where exactly one of the coordinates of $b$ is negative. Our solver is a distributed algorithm that takes $widetilde{O}(t_{hit} d_{max})$ time (where $widetilde{O}$ hides $text{poly}log n$ factors) to produce an approximate solution where $t_{hit}$ is the worst-case hitting time of the random walk on the graph, which is $Theta(n)$ for a large set of important graphs, and $d_{max}$ is the generalized maximum degree of the graph. The class of one-sink Laplacians includes the important voltage computation problem and allows us to compute the effective resistance between nodes in a distributed setting. As a result, our Laplacian solver can be used to adapt the approach by Kelner and Mk{a}dry (2009) to give the first distributed algorithm to compute approximate random spanning trees efficiently.