No Arabic abstract
Programming models for building large-scale distributed applications assist the developer in reasoning about consistency and distribution. However, many of the programming models for weak consistency, which promise the largest scalability gains, have little in the way of evaluation to demonstrate the promised scalability. We present an experience report on the implementation and large-scale evaluation of one of these models, Lasp, originally presented at PPDP `15, which provides a declarative, functional programming style for distributed applications. We demonstrate the scalability of Lasps prototype runtime implementation up to 1024 nodes in the Amazon cloud computing environment. It achieves high scalability by uniquely combining hybrid gossip with a programming model based on convergent computation. We report on the engineering challenges of this implementation and its evaluation, specifically related to operating research prototypes in a production cloud environment.
The subgraph-centric programming model is a promising approach and has been applied in many state-of-the-art distributed graph computing frameworks. However, traditional graph partition algorithms have significant difficulties in processing large-scale power-law graphs. The major problem is the communication bottleneck found in many subgraph-centric frameworks. Detailed analysis indicates that the communication bottleneck is caused by the huge communication volume or the extreme message imbalance among partitioned subgraphs. The traditional partition algorithms do not consider both factors at the same time, especially on power-law graphs. In this paper, we propose a novel efficient and balanced vertex-cut graph partition algorithm (EBV) which grants appropriate weights to the overall communication cost and communication balance. We observe that the number of replicated vertices and the balance of edge and vertex assignment have a great influence on communication patterns of distributed subgraph-centric frameworks, which further affect the overall performance. Based on this insight, We design an evaluation function that quantifies the proportion of replicated vertices and the balance of edges and vertices assignments as important parameters. Besides, we sort the order of edge processing by the sum of end-vertices degrees from small to large. Experiments show that EBV reduces replication factor and communication by at least 21.8% and 23.7% respectively than other self-based partition algorithms. When deployed in the subgraph-centric framework, it reduces the running time on power-law graphs by an average of 16.8% compared with the state-of-the-art partition algorithm. Our results indicate that EBV has a great potential in improving the performance of subgraph-centric frameworks for the parallel large-scale power-law graph processing.
Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC. The model enables efficient and easy distribution of input data files to computing nodes and gathering of output results from them. It eliminates the need for such manual tuning and makes the programming of large-scale clusters using a loosely coupled model easier. Our approach, inspired by in-memory approaches to collective operations for parallel programming, builds on fast local file systems to provide high-speed local file caches for parallel scripts, uses a broadcast approach to handle distribution of common input data, and uses efficient scatter/gather and caching techniques for input and output. We describe the design of the prototype model, its implementation on the Blue Gene/P supercomputer, and present preliminary measurements of its performance on synthetic benchmarks and on a large-scale molecular dynamics application.
Quantum computing harnesses quantum laws of nature to enable new types of algorithms, not efficiently possible on traditional computers, that may lead to breakthroughs in crucial areas like materials science and chemistry. There is rapidly growing demand for a quantum workforce educated in the basics of quantum computing, in particular in quantum programming. However, there are few offerings for non-specialists and little information on best practices for training computer science and engineering students. In this report we describe our experience teaching an undergraduate course on quantum computing using a practical, software-driven approach. We centered our course around teaching quantum algorithms through hands-on programming, reducing the significance of traditional written assignments and relying instead on self-paced programming exercises (Quantum Katas), a variety of programming assignments, and a final project. We observed that the programming sections of the course helped students internalize theoretical material presented during the lectures. In the survey results, students indicated that the programming exercises and the final project contributed the most to their learning process. We describe the motivation for centering the course around quantum programming, discuss major artifacts used in this course, and present our lessons learned and best practices for a future improved course offering. We hope that our experience will help guide instructors who want to adopt a practical approach to teaching quantum computing and will enable more undergraduate programs to offer quantum programming as an elective.
Serverless computing has emerged as a promising alternative to infrastructure- (IaaS) and platform-as-a-service (PaaS)cloud platforms for applications with ample parallelism and intermittent activity. Serverless promises greater resource elasticity, significant cost savings, and simplified application deployment. All major cloud providers, including Amazon, Google, and Microsoft, have introduced serverless to their public cloud offerings. For serverless to reach its potential, there is a pressing need for programming frameworks that abstract the deployment complexity away from the user. This includes simplifying the process of writing applications for serverless environments, automating task and data partitioning, and handling scheduling and fault tolerance. We present Ripple, a programming framework designed to specifically take applications written for single-machine execution and allow them to take advantage of the task parallelism of serverless. Ripple exposes a simple interface that users can leverage to express the high-level dataflow of a wide spectrum of applications, including machine learning (ML) analytics, genomics, and proteomics. Ripple also automates resource provisioning, meeting user-defined QoS targets, and handles fault tolerance by eagerly detecting straggler tasks. We port Ripple over AWS Lambda and show that, across a set of diverse applications, it provides an expressive and generalizable programming framework that simplifies running data-parallel applications on serverless, and can improve performance by up to 80x compared to IaaS/PaaS clouds for similar costs.
Work package 2 (WP2) aims to develop libraries for energy-efficient inter-process communication and data sharing on the EXCESS platforms. The Deliverable D2.4 reports on the final prototype of programming abstractions for energy-efficient inter- process communication. Section 1 is the updated overview of the prototype of programming abstraction and devised power/energy models. The Section 2-6 contain the latest results of the four studies: i) GreenBST, a energy-efficient and concurrent search tree (cf. Section 2) ii) Customization methodology for implementation of streaming aggregation in embedded systems (cf. Section 3) iii) Energy Model on CPU for Lock-free Data-structures in Dynamic Environments (cf. Section 4.10) iv) A General and Validated Energy Complexity Model for Multithreaded Algorithms (cf. Section 5)