In this work we provide algorithmic solutions to five fundamental problems concerning the verification, synthesis and correction of concurrent systems that can be modeled by bounded p/t-nets. We express concurrency via partial orders and assume that behavioral specifications are given via monadic second order logic. A c-partial-order is a partial order whose Hasse diagram can be covered by c paths. For a finite set T of transitions, we let P(c,T,phi) denote the set of all T-labelled c-partial-orders satisfying phi. If N=(P,T) is a p/t-net we let P(N,c) denote the set of all c-partially-ordered runs of N. A (b, r)-bounded p/t-net is a b-bounded p/t-net in which each place appears repeated at most r times. We solve the following problems: 1. Verification: given an MSO formula phi and a bounded p/t-net N determine whether P(N,c)subseteq P(c,T,phi), whether P(c,T,phi)subseteq P(N,c), or whether P(N,c)cap P(c,T,phi)=emptyset. 2. Synthesis from MSO Specifications: given an MSO formula phi, synthesize a semantically minimal (b,r)-bounded p/t-net N satisfying P(c,T,phi)subseteq P(N, c). 3. Semantically Safest Subsystem: given an MSO formula phi defining a set of safe partial orders, and a b-bounded p/t-net N, possibly containing unsafe behaviors, synthesize the safest (b,r)-bounded p/t-net N whose behavior lies in between P(N,c)cap P(c,T,phi) and P(N,c). 4. Behavioral Repair: given two MSO formulas phi and psi, and a b-bounded p/t-net N, synthesize a semantically minimal (b,r)-bounded p/t net N whose behavior lies in between P(N,c) cap P(c,T,phi) and P(c,T,psi). 5. Synthesis from Contracts: given an MSO formula phi^yes specifying a set of good behaviors and an MSO formula phi^no specifying a set of bad behaviors, synthesize a semantically minimal (b,r)-bounded p/t-net N such that P(c,T,phi^yes) subseteq P(N,c) but P(c,T,phi^no ) cap P(N,c)=emptyset.
This paper shows an application of the theory of sorting networks to facilitate the synthesis of optimized general purpose sorting libraries. Standard sorting libraries are often based on combinations of the classic Quicksort algorithm with insertion sort applied as the base case for small fixed numbers of inputs. Unrolling the code for the base case by ignoring loop conditions eliminates branching and results in code which is equivalent to a sorting network. This enables the application of further program transformations based on sorting network optimizations, and eventually the synthesis of code from sorting networks. We show that if considering the number of comparisons and swaps then theory predicts no real advantage of this approach. However, significant speed-ups are obtained when taking advantage of instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures. We provide empirical evidence that using code synthesized from efficient sorting networks as the base case for Quicksort libraries results in significant real-world speed-ups.
We propose two novel techniques for overcoming load-imbalance encountered when implementing so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario where two thread teams are created/activated during the factorization, with each team in charge of performing an independent task/branch of execution. The first technique promotes worker sharing (WS) between the two tasks, allowing the threads of the task that completes first to be reallocated for use by the costlier task. The second technique allows a fast task to alert the slower task of completion, enforcing the early termination (ET) of the second task, and a smooth transition of the factorization procedure into the next iteration. The two mechanisms are instantiated via a new malleable thread-level implementation of the Basic Linear Algebra Subprograms (BLAS), and their benefits are illustrated via an implementation of the LU factorization with partial pivoting enhanced with look-ahead. Concretely, our experimental results on a six core Intel-Xeon processor show the benefits of combining WS+ET, reporting competitive performance in comparison with a task-parallel runtime-based solution.
The interoperability of proof assistants and the integration of their libraries is a highly valued but elusive goal in the field of theorem proving. As a preparatory step, in previous work, we translated the libraries of multiple proof assistants, specifically the ones of Coq, HOL Light, IMPS, Isabelle, Mizar, and PVS into a universal format: OMDoc/MMT. Each translation presented tremendous theoretical, technical, and social challenges, some universal and some system-specific, some solvable and some still open. In this paper, we survey these challenges and compare and evaluate the solutions we chose. We believe similar library translations will be an essential part of any future system interoperability solution and our experiences will prove valuable to others undertaking such efforts.