New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Real-time cortical simulations: energy and interconnect scaling on distributed systems

81 0 0.0 ( 0 )

Download Cite

Added by Elena Pastorelli

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Francesco Simula - Elena Pastorelli - Pier Stanislao Paolucci

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We profile the impact of computation and inter-processor communication on the energy consumption and on the scaling of cortical simulations approaching the real-time regime on distributed computing platforms. Also, the speed and energy consumption of processor architectures typical of standard HPC and embedded platforms are compared. We demonstrate the importance of the design of low-latency interconnect for speed and energy consumption. The cost of cortical simulations is quantified using the Joule per synaptic event metric on both architectures. Reaching efficient real-time on large scale cortical simulations is of increasing relevance for both future bio-inspired artificial intelligence applications and for understanding the cognitive functions of the brain, a scientific quest that will require to embed large scale simulations into highly complex virtual or real worlds. This work stands at the crossroads between the WaveScalES experiment in the Human Brain Project (HBP), which includes the objective of large scale thalamo-cortical simulations of brain states and their transitions, and the ExaNeSt and EuroExa projects, that investigate the design of an ARM-based, low-power High Performance Computing (HPC) architecture with a dedicated interconnect scalable to million of cores; simulation of deep sleep Slow Wave Activity (SWA) and Asynchronous aWake (AW) regimes expressed by thalamo-cortical models are among their benchmarks.

rate research

The Brain on Low Power Architectures - Efficient Simulation of Cortical Slow Waves and Asynchronous States

212 - Roberto Ammendola , Andrea Biagioni , Fabrizio Capuani 2018

Efficient brain simulation is a scientific grand challenge, a parallel/distributed coding challenge and a source of requirements and suggestions for future computing architectures. Indeed, the human brain includes about 10^15 synapses and 10^11 neurons activated at a mean rate of several Hz. Full brain simulation poses Exascale challenges even if simulated at the highest abstraction level. The WaveScalES experiment in the Human Brain Project (HBP) has the goal of matching experimental measures and simulations of slow waves during deep-sleep and anesthesia and the transition to other brain states. The focus is the development of dedicated large-scale parallel/distributed simulation technologies. The ExaNeSt project designs an ARM-based, low-power HPC architecture scalable to million of cores, developing a dedicated scalable interconnect system, and SWA/AW simulations are included among the driving benchmarks. At the joint between both projects is the INFN proprietary Distributed and Plastic Spiking Neural Networks (DPSNN) simulation engine. DPSNN can be configured to stress either the networking or the computation features available on the execution platforms. The simulation stresses the networking component when the neural net - composed by a relatively low number of neurons, each one projecting thousands of synapses - is distributed over a large number of hardware cores. When growing the number of neurons per core, the computation starts to be the dominating component for short range connections. This paper reports about preliminary performance results obtained on an ARM-based HPC prototype developed in the framework of the ExaNeSt project. Furthermore, a comparison is given of instantaneous power, total energy consumption, execution time and energetic cost per synaptic event of SWA/AW DPSNN simulations when executed on either ARM- or Intel-based server platforms.

Distributed Parallel and Cluster Computing Neural and Evolutionary Computing Neurons and Cognition

Gaussian and exponential lateral connectivity on distributed spiking neural network simulation

413 - Elena Pastorelli , Pier Stanislao Paolucci , Francesco Simula 2018

We measured the impact of long-range exponentially decaying intra-areal lateral connectivity on the scaling and memory occupation of a distributed spiking neural network simulator compared to that of short-range Gaussian decays. While previous studies adopted short-range connectivity, recent experimental neurosciences studies are pointing out the role of longer-range intra-areal connectivity with implications on neural simulation platforms. Two-dimensional grids of cortical columns composed by up to 11 M point-like spiking neurons with spike frequency adaption were connected by up to 30 G synapses using short- and long-range connectivity models. The MPI processes composing the distributed simulator were run on up to 1024 hardware cores, hosted on a 64 nodes server platform. The hardware platform was a cluster of IBM NX360 M5 16-core compute nodes, each one containing two Intel Xeon Haswell 8-core E5-2630 v3 processors, with a clock of 2.40 G Hz, interconnected through an InfiniBand network, equipped with 4x QDR switches.

Distributed Parallel and Cluster Computing Neural and Evolutionary Computing Neurons and Cognition

Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems

348 - Ariful Azad , Oguz Selvitopi , Md Taufique Hussain 2021

Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, bioinformatics, and chemistry, are often hard to parallelize. The Combinatorial BLAS library implements key computational primitives for rapid development of combinatorial algorithms in distributed-memory systems. During the decade since its first introduction, the Combinatorial BLAS library has evolved and expanded significantly. This paper details many of the key technical features of Combinatorial BLAS version 2.0, such as communication avoidance, hierarchical parallelism via in-node multithreading, accelerator support via GPU kernels, generalized semiring support, implementations of key data structures and functions, and scalable distributed I/O operations for human-readable files. Our paper also presents several rules of thumb for choosing the right data structures and functions in Combinatorial BLAS 2.0, under various common application scenarios.

Distributed Parallel and Cluster Computing Discrete Mathematics Performance

Evolutionary Optimisation of Real-Time Systems and Networks

265 - Leandro Soares Indrusiak , Robert I. Davis , Piotr Dziurzanski 2019

The design space of networked embedded systems is very large, posing challenges to the optimisation of such platforms when it comes to support applications with real-time guarantees. Recent research has shown that a number of inter-related optimisation problems have a critical influence over the schedulability of a system, i.e. whether all its application components can execute and communicate by their respective deadlines. Examples of such optimization problems include task allocation and scheduling, communication routing and arbitration, memory allocation, and voltage and frequency scaling. In this paper, we advocate the use of evolutionary approaches to address such optimization problems, aiming to evolve individuals of increased fitness over multiple generations of potential solutions. We refer to plentiful evidence that existing real-time schedulability tests can be used effectively to guide evolutionary optimisation, either by themselves or in combination with other metrics such as energy dissipation or hardware overheads. We then push that concept one step further and consider the possibility of using evolutionary techniques to evolve the schedulability tests themselves, aiming to support the verification and optimisation of systems which are too complex for state-of-the-art (manual) derivation of schedulability tests.

Performance Neural and Evolutionary Computing

Toward real-time data query systems in HEP

140 - Jim Pivarski , David Lange , Thanat Jatuphattharachat 2017

Exploratory data analysis tools must respond quickly to a users questions, so that the answer to one question (e.g. a visualized histogram or fit) can influence the next. In some SQL-based query systems used in industry, even very large (petabyte) datasets can be summarized on a human timescale (seconds), employing techniques such as columnar data representation, caching, indexing, and code generation/JIT-compilation. This article describes progress toward realizing such a system for High Energy Physics (HEP), focusing on the intermediate problems of optimizing data access and calculations for query sized payloads, such as a single histogram or group of histograms, rather than large reconstruction or data-skimming jobs. These techniques include direct extraction of ROOT TBranches into Numpy arrays and compilation of Python analysis functions (rather than SQL) to be executed very quickly. We will also discuss the problem of caching and actively delivering jobs to worker nodes that have the necessary input data preloaded in cache. All of these pieces of the larger solution are available as standalone GitHub repositories, and could be used in current analyses.

Distributed Parallel and Cluster Computing

comments

Fetching comments

Yarmouk Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Real-time cortical simulations: energy and interconnect scaling on distributed systems

Ask ChatGPT about the research

No Arabic abstract

Read More