أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Zirui Zhou

An Optimal Resource Allocator of Elastic Training for Deep Learning Jobs on Cloud

111 - Liang Hu , Jiangcheng Zhu , Zirui Zhou 2021

Cloud training platforms, such as Amazon Web Services and Huawei Cloud provide users with computational resources to train their deep learning jobs. Elastic training is a service embedded in cloud training platforms that dynamically scales up or down the resources allocated to a job. The core technique of an elastic training system is to best allocate limited resources among heterogeneous jobs in terms of shorter queueing delay and higher training efficiency. This paper presents an optimal resource allocator for elastic training system that leverages a mixed-integer programming (MIP) model to maximize the training progress of deep learning jobs. We take advantage of the real-world job data obtained from ModelArts, the deep learning training platform of Huawei Cloud and conduct simulation experiments to compare the optimal resource allocator with a greedy one as benchmark. Numerical results show that the proposed allocator can reduce queuing time by up to 32% and accelerate training efficiency by up to 24% relative to the greedy resource allocator, thereby greatly improving user experience with Huawei ModelArts and potentially enabling the realization of higher profits for the product. Also, the optimal resource allocator is fast in decision-making, taking merely 0.4 seconds on average.

أنظمة وتحكم النظم الموزعة والتوازية والحوسبة العنقودية أنظمة وتحكم

Decentralized Composite Optimization in Stochastic Networks: A Dual Averaging Approach with Linear Convergence

148 - Changxin Liu , Zirui Zhou , Jian Pei 2021

Decentralized optimization, particularly the class of decentralized composite convex optimization (DCCO) problems, has found many applications. Due to ubiquitous communication congestion and random dropouts in practice, it is highly desirable to desi gn decentralized algorithms that can handle stochastic communication networks. However, most existing algorithms for DCCO only work in time-invariant networks and cannot be extended to stochastic networks because they inherently need knowledge of network topology $textit{a priori}$. In this paper, we propose a new decentralized dual averaging (DDA) algorithm that can solve DCCO in stochastic networks. Under a rather mild condition on stochastic networks, we show that the proposed algorithm attains $textit{global linear convergence}$ if each local objective function is strongly convex. Our algorithm substantially improves the existing DDA-type algorithms as the latter were only known to converge $textit{sublinearly}$ prior to our work. The key to achieving the improved rate is the design of a novel dynamic averaging consensus protocol for DDA, which intuitively leads to more accurate local estimates of the global dual variable. To the best of our knowledge, this is the first linearly convergent DDA-type decentralized algorithm and also the first algorithm that attains global linear convergence for solving DCCO in stochastic networks. Numerical results are also presented to support our design and analysis.

التحسين والتحكم النظم الموزعة والتوازية والحوسبة العنقودية

Optimal Non-Convex Exact Recovery in Stochastic Block Model via Projected Power Method

173 - Peng Wang , Huikang Liu , Zirui Zhou 2021

In this paper, we study the problem of exact community recovery in the symmetric stochastic block model, where a graph of $n$ vertices is randomly generated by partitioning the vertices into $K ge 2$ equal-sized communities and then connecting each p air of vertices with probability that depends on their community memberships. Although the maximum-likelihood formulation of this problem is discrete and non-convex, we propose to tackle it directly using projected power iterations with an initialization that satisfies a partial recovery condition. Such an initialization can be obtained by a host of existing methods. We show that in the logarithmic degree regime of the considered problem, the proposed method can exactly recover the underlying communities at the information-theoretic limit. Moreover, with a qualified initialization, it runs in $mathcal{O}(nlog^2n/loglog n)$ time, which is competitive with existing state-of-the-art methods. We also present numerical results of the proposed method to support and complement our theoretical development.

التحسين والتحكم

Iteration-complexity of first-order augmented Lagrangian methods for convex conic programming

84 - Zhaosong Lu , Zirui Zhou 2018

In this paper we consider a class of convex conic programming. In particular, we propose an inexact augmented Lagrangian (I-AL) method for solving this problem, in which the augmented Lagrangian subproblems are solved approximately by a variant of Ne sterovs optimal first-order method. We show that the total number of first-order iterations of the proposed I-AL method for computing an $epsilon$-KKT solution is at most $mathcal{O}(epsilon^{-7/4})$. We also propose a modified I-AL method and show that it has an improved iteration-complexity $mathcal{O}(epsilon^{-1}logepsilon^{-1})$, which is so far the lowest complexity bound among all first-order I-AL type of methods for computing an $epsilon$-KKT solution. Our complexity analysis of the I-AL methods is mainly based on an analysis on inexact proximal point algorithm (PPA) and the link between the I-AL methods and inexact PPA. It is substantially different from the existing complexity analyses of the first-order I-AL methods in the literature, which typically regard the I-AL methods as an inexact dual gradient method. Compared to the mostly related I-AL methods cite{Lan16}, our modified I-AL method is more practically efficient and also applicable to a broader class of problems.

التحسين والتحكم التعقيد الحسابي التحليل العددي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد