No Arabic abstract
This paper is a contribution to exploring and analyzing space-improvements in concurrent programming languages, in particular in the functional process-calculus CHF. Space-improvements are defined as a generalization of the corresponding notion in deterministic pure functional languages. The main part of the paper is the O(n*log n) algorithm SpOptN for offline space optimization of several parallel independent processes. Applications of this algorithm are: (i) affirmation of space improving transformations for particular classes of program transformations; (ii) support of an interpreter-based method for refuting space-improvements; and (iii) as a stand-alone offline-optimizer for space (or similar resources) of parallel processes.
The Message Passing Interface specification (MPI) defines a portable message-passing API used to program parallel computers. MPI programs manifest a number of challenges on what concerns correctness: sent and expected values in communications may not match, resulting in incorrect computations possibly leading to crashes; and programs may deadlock resulting in wasted resources. Existing tools are not completely satisfactory: model-checking does not scale with the number of processes; testing techniques wastes resources and are highly dependent on the quality of the test set. As an alternative, we present a prototype for a type-based approach to programming and verifying MPI like programs against protocols. Protocols are written in a dependent type language designed so as to capture the most common primitives in MPI, incorporating, in addition, a form of primitive recursion and collective choice. Protocols are then translated into Why3, a deductive software verification tool. Source code, in turn, is written in WhyML, the language of the Why3 platform, and checked against the protocol. Programs that pass verification are guaranteed to be communication safe and free from deadlocks. We verified several parallel programs from textbooks using our approach, and report on the outcome.
Answer Set Programming (ASP) is a powerful logic-based programming language, which is enjoying increasing interest within the scientific community and (very recently) in industry. The evaluation of ASP programs is traditionally carried out in two steps. At the first step an input program P undergoes the so-called instantiation (or grounding) process, which produces a program P semantically equivalent to P, but not containing any variable; in turn, P is evaluated by using a backtracking search algorithm in the second step. It is well-known that instantiation is important for the efficiency of the whole evaluation, might become a bottleneck in common situations, is crucial in several realworld applications, and is particularly relevant when huge input data has to be dealt with. At the time of this writing, the available instantiator modules are not able to exploit satisfactorily the latest hardware, featuring multi-core/multi-processor SMP (Symmetric MultiProcessing) technologies. This paper presents some parallel instantiation techniques, including load-balancing and granularity control heuristics, which allow for the effective exploitation of the processing power offered by modern SMP machines. This is confirmed by an extensive experimental analysis herein reported. To appear in Theory and Practice of Logic Programming (TPLP). KEYWORDS: Answer Set Programming, Instantiation, Parallelism, Heuristics
We present a dataflow model for modelling parallel Unix shell pipelines. To accurately capture the semantics of complex Unix pipelines, the dataflow model is order-aware, i.e., the order in which a node in the dataflow graph consumes inputs from different edges plays a central role in the semantics of the computation and therefore in the resulting parallelization. We use this model to capture the semantics of transformations that exploit data parallelism available in Unix shell computations and prove their correctness. We additionally formalize the translations from the Unix shell to the dataflow model and from the dataflow model back to a parallel shell script. We implement our model and transformations as the compiler and optimization passes of a system parallelizing shell pipelines, and use it to evaluate the speedup achieved on 47 pipelines.
We address the problem of analysing the complexity of concurrent programs written in Pi-calculus. We are interested in parallel complexity, or span, understood as the execution time in a model with maximal parallelism. A type system for parallel complexity has been recently proposed by Baillot and Ghyselen but it is too imprecise for non-linear channels and cannot analyse some concurrent processes. Aiming for a more precise analysis, we design a type system which builds on the concepts of sized types and usages. The new variant of usages we define accounts for the various ways a channel is employed and relies on time annotations to track under which conditions processes can synchronize. We prove that a type derivation for a process provides an upper bound on its parallel complexity.
Designing efficient cooling systems for integrated circuits (ICs) relies on a deep understanding of the electro-thermal properties of transistors. To shed light on this issue in currently fabricated FinFETs, a quantum mechanical solver capable of revealing atomically-resolved electron and phonon transport phenomena from first-principles is required. In this paper, we consider a global, data-centric view of a state-of-the-art quantum transport simulator to optimize its execution on supercomputers. The approach yields coarse- and fine-grained data-movement characteristics, which are used for performance and communication modeling, communication-avoidance, and data-layout transformations. The transformations are tuned for the Piz Daint and Summit supercomputers, where each platform requires different caching and fusion strategies to perform optimally. The presented results make ab initio device simulation enter a new era, where nanostructures composed of over 10,000 atoms can be investigated at an unprecedented level of accuracy, paving the way for better heat management in next-generation ICs.