Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

On Dynamic Job Ordering and Slot Configurations for Minimizing the Makespan Of Multiple MapReduce Jobs

80 0 0.0 ( 0 )

Download Cite

Added by David Tian

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Wenhong Tian - Guangchun Luo - Ling Tian

Data Structures and Algorithms Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

MapReduce is a popular parallel computing paradigm for Big Data processing in clusters and data centers. It is observed that different job execution orders and MapReduce slot configurations for a MapReduce workload have significantly different performance with regarding to the makespan, total completion time, system utilization and other performance metrics. There are quite a few algorithms on minimizing makespan of multiple MapReduce jobs. However, these algorithms are heuristic or suboptimal. The best known algorithm for minimizing the makespan is 3-approximation by applying Johnson rule. In this paper, we propose an approach called UAAS algorithm to meet the conditions of classical Johnson model. Then we can still use Johnson model for an optimal solution. We explain how to adapt to Johnson model and provide a few key features of our proposed method.

rate research

Hierarchy-Based Algorithms for Minimizing Makespan under Precedence and Communication Constraints

97 - Janardhan Kulkarni , Shi Li , Jakub Tarnawski 2020

We consider the classic problem of scheduling jobs with precedence constraints on a set of identical machines to minimize the makespan objective function. Understanding the exact approximability of the problem when the number of machines is a constant is a well-known question in scheduling theory. Indeed, an outstanding open problem from the classic book of Garey and Johnson asks whether this problem is NP-hard even in the case of 3 machines and unit-length jobs. In a recent breakthrough, Levey and Rothvoss gave a $(1+epsilon)$-approximation algorithm, which runs in nearly quasi-polynomial time, for the case when job have unit lengths. However, a substantially more difficult case where jobs have arbitrary processing lengths has remained open. We make progress on this more general problem. We show that there exists a $(1+epsilon)$-approximation algorithm (with similar running time as that of Levey and Rothvoss) for the non-migratory setting: when every job has to be scheduled entirely on a single machine, but within a machine the job need not be scheduled during consecutive time steps. Further, we also show that our algorithmic framework generalizes to another classic scenario where, along with the precedence constraints, the jobs also have communication delay constraints. Both of these fundamental problems are highly relevant to the practice of datacenter scheduling.

Data Structures and Algorithms

Gene expression modelling across multiple cell-lines with MapReduce

629 - David M. Budden , Edmund J. Crampin 2015

With the wealth of high-throughput sequencing data generated by recent large-scale consortia, predictive gene expression modelling has become an important tool for integrative analysis of transcriptomic and epigenetic data. However, sequencing data-sets are characteristically large, and previously modelling frameworks are typically inefficient and unable to leverage multi-core or distributed processing architectures. In this study, we detail an efficient and parallelised MapReduce implementation of gene expression modelling. We leverage the computational efficiency of this framework to provide an integrative analysis of over fifty histone modification data-sets across a variety of cancerous and non-cancerous cell-lines. Our results demonstrate that the genome-wide relationships between histone modifications and mRNA transcription are lineage, tissue and karyotype-invariant, and that models trained on matched epigenetic/transcriptomic data from non-cancerous cell-lines are able to predict cancerous expression with equivalent genome-wide fidelity.

Quantitative Methods Distributed Parallel and Cluster Computing Genomics

The Distribution and Deposition Algorithm for Multiple Sequences Sets

431 - Kang Ning , Hon Wai Leong 2009

Sequences set is a mathematical model used in many applications. As the number of the sequences becomes larger, single sequence set model is not appropriate for the rapidly increasing problem sizes. For example, more and more text processing applications separate a single big text file into multiple files before processing. For these applications, the underline mathematical model is multiple sequences sets (MSS). Though there is increasing use of MSS, there is little research on how to process MSS efficiently. To process multiple sequences sets, sequences are first distributed to different sets, and then sequences for each set are processed. Deriving effective algorithm for MSS processing is both interesting and challenging. In this paper, we have defined the cost functions and performance ratio for analysis of the quality of synthesis sequences. Based on these, the problem of Process of Multiple Sequences Sets (PMSS) is formulated. We have first proposed two greedy algorithms for the PMSS problem, which are based on generalization of algorithms for single sequences set. Then based on the analysis of the characteristics of multiple sequences sets, we have proposed the Distribution and Deposition (DDA) algorithm and DDA* algorithm for PMSS problem. In DDA algorithm, the sequences are first distributed to multiple sets according to their alphabet contents; then sequences in each set are deposited by the deposition algorithm. The DDA* algorithm differs from the DDA algorithm in that the DDA* algorithm distributes sequences by clustering based on sequence profiles. Experiments show that DDA and DDA* always output results with smaller costs than other algorithms, and DDA* outperforms DDA in most instances. The DDA and DDA* algorithms are also efficient both in time and space.

Data Structures and Algorithms Distributed Parallel and Cluster Computing Discrete Mathematics

Distributed Approximation Algorithms for the Multiple Knapsack Problem

231 - Ananth Murthy , Chandan Yeshwanth , Shrisha Rao 2017

We consider the distributed version of the Multiple Knapsack Problem (MKP), where $m$ items are to be distributed amongst $n$ processors, each with a knapsack. We propose different distributed approximation algorithms with a tradeoff between time and message complexities. The algorithms are based on the greedy approach of assigning the best item to the knapsack with the largest capacity. These algorithms obtain a solution with a bound of $frac{1}{n+1}$ times the optimum solution, with either $mathcal{O}left(mlog nright)$ time and $mathcal{O}left(m nright)$ messages, or $mathcal{O}left(mright)$ time and $mathcal{O}left(mn^{2}right)$ messages.

Data Structures and Algorithms Distributed Parallel and Cluster Computing Discrete Mathematics

Online Makespan Minimization: The Power of Restart

158 - Zhiyi Huang , Ning Kang , Zhihao Gavin Tang 2018

We consider the online makespan minimization problem on identical machines. Chen and Vestjens (ORL 1997) show that the largest processing time first (LPT) algorithm is 1.5-competitive. For the special case of two machines, Noga and Seiden (TCS 2001) introduce the SLEEPY algorithm that achieves a competitive ratio of $(5 - sqrt{5})/2 approx 1.382$, matching the lower bound by Chen and Vestjens (ORL 1997). Furthermore, Noga and Seiden note that in many applications one can kill a job and restart it later, and they leave an open problem whether algorithms with restart can obtain better competitive ratios. We resolve this long-standing open problem on the positive end. Our algorithm has a natural rule for killing a processing job: a newly-arrived job replaces the smallest processing job if 1) the new job is larger than other pending jobs, 2) the new job is much larger than the processing one, and 3) the processed portion is small relative to the size of the new job. With appropriate choice of parameters, we show that our algorithm improves the 1.5 competitive ratio for the general case, and the 1.382 competitive ratio for the two-machine case.

Data Structures and Algorithms

comments

Fetching comments

AlHawash Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

On Dynamic Job Ordering and Slot Configurations for Minimizing the Makespan Of Multiple MapReduce Jobs

Ask ChatGPT about the research

No Arabic abstract

Read More