New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Bounds on series-parallel slowdown

111 0 0.0 ( 0 )

Download Cite

Added by Andr\\'as Salamon

Publication date 2009

fields Informatics Engineering

and research's language is English

Authors Andras Z. Salamon - Vashti Galpin

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We use activity networks (task graphs) to model parallel programs and consider series-parallel extensions of these networks. Our motivation is two-fold: the benefits of series-parallel activity networks and the modelling of programming constructs, such as those imposed by current parallel computing environments. Series-parallelisation adds precedence constraints to an activity network, usually increasing its makespan (execution time). The slowdown ratio describes how additional constraints affect the makespan. We disprove an existing conjecture positing a bound of two on the slowdown when workload is not considered. Where workload is known, we conjecture that 4/3 slowdown is always achievable, and prove our conjecture for small networks using max-plus algebra. We analyse a polynomial-time algorithm showing that achieving 4/3 slowdown is in exp-APX. Finally, we discuss the implications of our results.

rate research

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

136 - Grzegorz Kwasniewski , Marko Kabic , Tal Ben-Nun 2021

Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N^3/(P*sqrt(M)) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 262,144 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPACK-compatible and available as an open-source library.

Distributed Parallel and Cluster Computing Computational Complexity Performance

A Note on Parallel Algorithmic Speedup Bounds

106 - Neil J. Gunther 2011

A parallel program can be represented as a directed acyclic graph. An important performance bound is the time to execute the critical path through the graph. We show how this performance metric is related to Amdahl speedup and the degree of average parallelism. These bounds formally exclude superlinear performance.

Distributed Parallel and Cluster Computing Performance

On Maximum Common Subgraph Problems in Series-Parallel Graphs

65 - Nils Kriege , Florian Kurpicz , Petra Mutzel 2017

The complexity of the maximum common connected subgraph problem in partial $k$-trees is still not fully understood. Polynomial-time solutions are known for degree-bounded outerplanar graphs, a subclass of the partial $2$-trees. On the other hand, the problem is known to be ${bf NP}$-hard in vertex-labeled partial $11$-trees of bounded degree. We consider series-parallel graphs, i.e., partial $2$-trees. We show that the problem remains ${bf NP}$-hard in biconnected series-parallel graphs with all but one vertex of degree $3$ or less. A positive complexity result is presented for a related problem of high practical relevance which asks for a maximum common connected subgraph that preserves blocks and bridges of the input graphs. We present a polynomial time algorithm for this problem in series-parallel graphs, which utilizes a combination of BC- and SP-tree data structures to decompose both graphs.

Data Structures and Algorithms Computational Complexity

Topology Dependent Bounds For FAQs

245 - Michael Langberg , Shi Li , Sai Vikneshwar Mani Jayaraman 2020

In this paper, we prove topology dependent bounds on the number of rounds needed to compute Functional Aggregate Queries (FAQs) studied by Abo Khamis et al. [PODS 2016] in a synchronous distributed network under the model considered by Chattopadhyay et al. [FOCS 2014, SODA 2017]. Unlike the recent work on computing database queries in the Massively Parallel Computation model, in the model of Chattopadhyay et al., nodes can communicate only via private point-to-point channels and we are interested in bounds that work over an {em arbitrary} communication topology. This is the first work to consider more practically motivated problems in this distributed model. For the sake of exposition, we focus on two special problems in this paper: Boolean Conjunctive Query (BCQ) and computing variable/factor marginals in Probabilistic Graphical Models (PGMs). We obtain tight bounds on the number of rounds needed to compute such queries as long as the underlying hypergraph of the query is $O(1)$-degenerate and has $O(1)$-arity. In particular, the $O(1)$-degeneracy condition covers most well-studied queries that are efficiently computable in the centralized computation model like queries with constant treewidth. These tight bounds depend on a new notion of `width (namely internal-node-width) for Generalized Hypertree Decompositions (GHDs) of acyclic hypergraphs, which minimizes the number of internal nodes in a sub-class of GHDs. To the best of our knowledge, this width has not been studied explicitly in the theoretical database literature. Finally, we consider the problem of computing the product of a vector with a chain of matrices and prove tight bounds on its round complexity (over the finite field of two elements) using a novel min-entropy based argument.

Distributed Parallel and Cluster Computing Computational Complexity Databases

Lower bounds for maximal matchings and maximal independent sets

79 - Alkida Balliu , Sebastian Brandt , Juho Hirvonen 2019

There are distributed graph algorithms for finding maximal matchings and maximal independent sets in $O(Delta + log^* n)$ communication rounds; here $n$ is the number of nodes and $Delta$ is the maximum degree. The lower bound by Linial (1987, 1992) shows that the dependency on $n$ is optimal: these problems cannot be solved in $o(log^* n)$ rounds even if $Delta = 2$. However, the dependency on $Delta$ is a long-standing open question, and there is currently an exponential gap between the upper and lower bounds. We prove that the upper bounds are tight. We show that maximal matchings and maximal independent sets cannot be found in $o(Delta + log log n / log log log n)$ rounds with any randomized algorithm in the LOCAL model of distributed computing. As a corollary, it follows that there is no deterministic algorithm for maximal matchings or maximal independent sets that runs in $o(Delta + log n / log log n)$ rounds; this is an improvement over prior lower bounds also as a function of $n$.

Distributed Parallel and Cluster Computing Computational Complexity

comments

Fetching comments

National Institute of Business Administration

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Bounds on series-parallel slowdown

Ask ChatGPT about the research

No Arabic abstract

Read More