Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

NEMO5: Achieving High-end Internode Communication for Performance Projection Beyond Moores Law

79 0 0.0 ( 0 )

Download Cite

Added by Daniel Lemus

Publication date 2015

fields Informatics Engineering Physics

and research's language is English

Authors Robert Andrawis - Jose David Bermeo - James Charles

Distributed Parallel and Cluster Computing Computational Physics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Electronic performance predictions of modern nanotransistors require nonequilibrium Greens functions including incoherent scattering on phonons as well as inclusion of random alloy disorder and surface roughness effects. The solution of all these effects is numerically extremely expensive and has to be done on the worlds largest supercomputers due to the large memory requirement and the high performance demands on the communication network between the compute nodes. In this work, it is shown that NEMO5 covers all required physical effects and their combination. Furthermore, it is also shown that NEMO5s implementation of the algorithm scales very well up to about 178176CPUs with a sustained performance of about 857 TFLOPS. Therefore, NEMO5 is ready to simulate future nanotransistors.

rate research

MLIR: A Compiler Infrastructure for the End of Moores Law

289 - Chris Lattner , Mehdi Amini , Uday Bondhugula 2020

This work presents MLIR, a novel approach to building reusable and extensible compiler infrastructure. MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, significantly reduce the cost of building domain specific compilers, and aid in connecting existing compilers together. MLIR facilitates the design and implementation of code generators, translators and optimizers at different levels of abstraction and also across application domains, hardware targets and execution environments. The contribution of this work includes (1) discussion of MLIR as a research artifact, built for extension and evolution, and identifying the challenges and opportunities posed by this novel design point in design, semantics, optimization specification, system, and engineering. (2) evaluation of MLIR as a generalized infrastructure that reduces the cost of building compilers-describing diverse use-cases to show research and educational opportunities for future programming languages, compilers, execution environments, and computer architecture. The paper also presents the rationale for MLIR, its original design principles, structures and semantics.

Programming Languages Machine Learning

SparCML: High-Performance Sparse Communication for Machine Learning

157 - Cedric Renggli , Saleh Ashkboos , Mehdi Aghagolzadeh 2018

Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed data parallel distributed across many nodes. Each nodes contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, SparCML, extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, SparCML and its techniques will form the basis of future highly-scalable machine learning frameworks.

Distributed Parallel and Cluster Computing Machine Learning

Effective Scaling of Blockchain Beyond Consensus Innovations and Moores Law

80 - Yinqiu Liu , Kai Qian , Jianli Chen 2020

As an emerging technology, blockchain has achieved great success in numerous application scenarios, from intelligent healthcare to smart cities. However, a long-standing bottleneck hindering its further development is the massive resource consumption attributed to the distributed storage and computation methods. This makes blockchain suffer from insufficient performance and poor scalability. Here, we analyze the recent blockchain techniques and demonstrate that the potential of widely-adopted consensus-based scaling is seriously limited, especially in the current era when Moores law-based hardware scaling is about to end. We achieve this by developing an open-source benchmarking tool, called Prism, for investigating the key factors causing low resource efficiency and then discuss various topology and hardware innovations which could help to scale up blockchain. To the best of our knowledge, this is the first in-depth study that explores the next-generation scaling strategies by conducting large-scale and comprehensive benchmarking.

Cryptography and Security Distributed Parallel and Cluster Computing

Moores Law in CLEAR Light

55 - Shuai Sun , Vikram K. Narayana , Tarek El-Ghazawi 2016

The inability of Moores Law and other figure-of-merits (FOMs) to accurately explain the technology development of the semiconductor industry demands a holistic merit to guide the industry. Here we introduce a FOM termed CLEAR that accurately postdicts technology developments since the 1940s until today, and predicts photonics as a logical extension to keep-up the pace of information-handling machines. We show that CLEAR (Capability-to-Latency-Energy-Amount-Resistance) is multi-hierarchical applying to the device, interconnect, and system level. Being a holistic FOM, we show that empirical trends such as Moores Law and the Makimotos wave are special cases of the universal CLEAR merit. Looking ahead, photonic board- and chip-level technologies are able to continue the observed doubling rate of the CLEAR value every 12 months, while electronic technologies are unable to keep pace.

Emerging Technologies

Achieving near native runtime performance and cross-platform performance portability for random number generation through SYCL interoperability

174 - Vincent R. Pascuzzi , Mehdi Goli 2021

High-performance computing (HPC) is a major driver accelerating scientific research and discovery, from quantum simulations to medical therapeutics. The growing number of new HPC systems coming online are being furnished with various hardware components, engineered by competing industry entities, each having their own architectures and platforms to be supported. While the increasing availability of these resources is in many cases pivotal to successful science, even the largest collaborations lack the computational expertise required for maximal exploitation of current hardware capabilities. The need to maintain multiple platform-specific codebases further complicates matters, potentially adding a constraint on the number of machines that can be utilized. Fortunately, numerous programming models are under development that aim to facilitate software solutions for heterogeneous computing. In this paper, we leverage the SYCL programming model to demonstrate cross-platform performance portability across heterogeneous resources. We detail our NVIDIA and AMD random number generator extensions to the oneMKL open-source interfaces library. Performance portability is measured relative to platform-specific baseline applications executed on four major hardware platforms using two different compilers supporting SYCL. The utility of our extensions are exemplified in a real-world setting via a high-energy physics simulation application. We show the performance of implementations that capitalize on SYCL interoperability are at par with native implementations, attesting to the cross-platform performance portability of a SYCL-based approach to scientific codes.

Distributed Parallel and Cluster Computing Mathematical Software High Energy Physics - Experiment

comments

Fetching comments

Al-Andalus University for Medical Sciences

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

NEMO5: Achieving High-end Internode Communication for Performance Projection Beyond Moores Law

Ask ChatGPT about the research

No Arabic abstract

Read More