White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing

69 0 0.0 ( 0 )

Download Cite

Added by Roman Iakymchuk

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Roman Iakymchuk - Daichi Mukunoki - Artur Podobas

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In numerical computations, precision of floating-point computations is a key factor to determine the performance (speed and energy-efficiency) as well as the reliability (accuracy and reproducibility). However, precision generally plays a contrary role for both. Therefore, the ultimate concept for maximizing both at the same time is the minimal-precision computing through precision-tuning, which adjusts the optimal precision for each operation and data. Several studies have been already conducted for it so far (e.g. Precimoniuos and Verrou), but the scope of those studies is limited to the precision-tuning alone. Hence, we aim to propose a broader concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. In 2019, we have started the Minimal-Precision Computing project to propose a more broad concept of the minimal-precision computing system with precision-tuning, involving both hardware and software stack. Specifically, our system combines (1) a precision-tuning method based on Discrete Stochastic Arithmetic (DSA), (2) arbitrary-precision arithmetic libraries, (3) fast and accurate numerical libraries, and (4) Field-Programmable Gate Array (FPGA) with High-Level Synthesis (HLS). In this white paper, we aim to provide an overview of various technologies related to minimal- and mixed-precision, to outline the future direction of the project, as well as to discuss current challenges together with our project members and guest speakers at the LSPANC 2020 workshop; https://www.r-ccs.riken.jp/labs/lpnctrt/lspanc2020jan/.

rate research

Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing

494 - Justin M. Wozniak , Timothy G. Armstrong , Ketan C. Maheshwari 2021

Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, parallel filesystem overheads due to the small file system accesses common in scripted approaches, and other issues. We present here a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl, with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. In this approach, Swift handles data management, movement, and marshaling among distributed-memory processes without direct user manipulation of low-level communication libraries such as MPI. We present a technique to efficiently launch scripted applications on large-scale supercomputers using a hierarchical programming model.

Distributed Parallel and Cluster Computing

Design and implementation of self-adaptable parallel algorithms for scientific computing on highly heterogeneous HPC platforms

465 - Alexey Lastovetsky , Ravi Reddy , Vladimir Rychkov 2011

Traditional heterogeneous parallel algorithms, designed for heterogeneous clusters of workstations, are based on the assumption that the absolute speed of the processors does not depend on the size of the computational task. This assumption proved inaccurate for modern and perspective highly heterogeneous HPC platforms. New class of algorithms based on the functional performance model (FPM), representing the speed of the processor by a function of problem size, has been recently proposed. These algorithms cannot be however employed in self-adaptable applications because of very high cost of construction of the functional performance model. The paper presents a new class of parallel algorithms for highly heterogeneous HPC platforms. Like traditional FPM-based algorithms, these algorithms assume that the speed of the processors is characterized by speed functions rather than speed constants. Unlike the traditional algorithms, they do not assume the speed functions to be given. Instead, they estimate the speed functions of the processors for different problem sizes during their execution. These algorithms do not construct the full speed function for each processor but rather build and use their partial estimates sufficient for optimal distribution of computations with a given accuracy. The low execution cost of distribution of computations between heterogeneous processors in these algorithms make them suitable for employment in self-adaptable applications. Experiments with parallel matrix multiplication applications based on this approach are performed on local and global heterogeneous computational clusters. The results show that the execution time of optimal matrix distribution between processors is significantly less, by orders of magnitude, than the total execution time of the optimized application.

Distributed Parallel and Cluster Computing

CMOS Quantum Computing: Toward A Quantum Computer System-on-Chip

76 - Reza Nikandish , Elena Blokhina , 2020

Quantum computing is experiencing the transition from a scientific to an engineering field with the promise to revolutionize an extensive range of applications demanding high-performance computing. Many implementation approaches have been pursued for quantum computing systems, where currently the main streams can be identified based on superconducting, photonic, trapped-ion, and semiconductor qubits. Semiconductor-based quantum computing, specifically using CMOS technologies, is promising as it provides potential for the integration of qubits with their control and readout circuits on a single chip. This paves the way for the realization of a large-scale quantum computing system for solving practical problems. In this paper, we present an overview and future perspective of CMOS quantum computing, exploring developed semiconductor qubit structures, quantum gates, as well as control and readout circuits, with a focus on the promises and challenges of CMOS implementation.

Quantum Physics Signal Processing

Perspective: Toward large-scale fault-tolerant universal photonic quantum computing

90 - Shuntaro Takeda , Akira Furusawa 2019

Photonic quantum computing is one of the leading approaches to universal quantum computation. However, large-scale implementation of photonic quantum computing has been hindered by its intrinsic difficulties, such as probabilistic entangling gates for photonic qubits and lack of scalable ways to build photonic circuits. Here we discuss how to overcome these limitations by taking advantage of two key ideas which have recently emerged. One is a hybrid qubit-continuous variable approach for realizing a deterministic universal gate set for photonic qubits. The other is time-domain multiplexing technique to perform arbitrarily large-scale quantum computing without changing the configuration of photonic circuits. These ideas together will enable scalable implementation of universal photonic quantum computers in which hardware-efficient error correcting codes can be incorporated. Furthermore, all-optical implementation of such systems can increase the operational bandwidth beyond THz in principle, utimately enabling large-scale fault-tolerant universal quantum computers with ultra-high operation frequency.

Quantum Physics

Hardness of minimal symmetry breaking in distributed computing

89 - Alkida Balliu , Juho Hirvonen , Dennis Olivetti 2018

A graph is weakly $2$-colored if the nodes are labeled with colors black and white such that each black node is adjacent to at least one white node and vice versa. In this work we study the distributed computational complexity of weak $2$-coloring in the standard LOCAL model of distributed computing, and how it is related to the distributed computational complexity of other graph problems. First, we show that weak $2$-coloring is a minimal distributed symmetry-breaking problem for regular even-degree trees and high-girth graphs: if there is any non-trivial locally checkable labeling problem that is solvable in $o(log^* n)$ rounds with a distributed graph algorithm in the middle of a regular even-degree tree, then weak $2$-coloring is also solvable in $o(log^* n)$ rounds there. Second, we prove a tight lower bound of $Omega(log^* n)$ for the distributed computational complexity of weak $2$-coloring in regular trees; previously only a lower bound of $Omega(log log^* n)$ was known. By minimality, the same lower bound holds for any non-trivial locally checkable problem inside regular even-degree trees.

Distributed Parallel and Cluster Computing Computational Complexity