أوراق بحثية, رسائل ماجستير ودكتوراه حول لغات البرمجة

Efficient Path-Sensitive Data-Dependence Analysis

174 - Peisen Yao , Jinguo Zhou , Xiao Xiao 2021

This paper presents a scalable path- and context-sensitive data-dependence analysis. The key is to address the aliasing-path-explosion problem via a sparse, demand-driven, and fused approach that piggybacks the computation of pointer information with the resolution of data dependence. Specifically, our approach decomposes the computational efforts of disjunctive reasoning into 1) a context- and semi-path-sensitive analysis that concisely summarizes data dependence as the symbolic and storeless value-flow graphs, and 2) a demand-driven phase that resolves transitive data dependence over the graphs. We have applied the approach to two clients, namely thin slicing and value flow analysis. Using a suite of 16 programs ranging from 13 KLoC to 8 MLoC, we compare our techniques against a diverse group of state-of-the-art analyses, illustrating significant precision and scalability advantages of our approach.

لغات البرمجة

Trillium: Unifying Refinement and Higher-Order Distributed Separation Logic

238 - Amin Timany , Simon Oddershede Gregersen , Leo Stefanesco 2021

We present a unification of refinement and Hoare-style reasoning in a foundational mechanized higher-order distributed separation logic. This unification enables us to prove formally in the Coq proof assistant that concrete implementations of challen ging distributed systems refine more abstract models and to combine refinement-style reasoning with Hoare-style program verification. We use our logic to prove correctness of concrete implementations of two-phase commit and single-decree Paxos by showing that they refine their abstract TLA+ specifications. We further use our notion of refinement to transfer fairness assumptions on program executions to model traces and then transfer liveness properties of fair model traces back to program executions, which enables us to prove liveness properties such as strong eventual consistency of a concrete implementation of a Conflict-Free Replicated Data Type and fair termination of a concurrent program.

لغات البرمجة

Dala: A Simple Capability-Based Dynamic Language Design For Data Race-Freedom

242 - Kiko Fernandez-Reyes , Isaac Oscar Gariano , James Noble 2021

Dynamic languages like Erlang, Clojure, JavaScript, and E adopted data-race freedom by design. To enforce data-race freedom, these languages either deep copy objects during actor (thread) communication or proxy back to their owning thread. We present Dala, a simple programming model that ensures data-race freedom while supporting efficient inter-thread communication. Dala is a dynamic, concurrent, capability-based language that relies on three core capabilities: immutable values can be shared freely; isolated mutable objects can be transferred between threads but not aliased; local objects can be aliased within their owning thread but not dereferenced by other threads. Objects with capabilities can co-exist with unsafe objects, that are unchecked and may suffer data races, without compromising the safety of safe objects. We present a formal model of Dala, prove data race-freedom and state and prove a dynamic gradual guarantee. These theorems guarantee data race-freedom when using safe capabilities and show that the addition of capabilities is semantics preserving modulo permission and cast errors.

لغات البرمجة النظم الموزعة والتوازية والحوسبة العنقودية

DPGen: Automated Program Synthesis for Differential Privacy

199 - Yuxin Wang , Zeyu Ding , Yingtai Xiao 2021

Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then decides where to add noise, and how much of it to add. This can be a non-trivial process -- if not done carefully, the algorithm might either violate differential privacy or have low utility. In this paper, we present DPGen, a program synthesizer that takes in non-private code (without any noise) and automatically synthesizes its differentially private version (with carefully calibrated noise). Under the hood, DPGen uses novel algorithms to automatically generate a sketch program with candidate locations for noise, and then optimize privacy proof and noise scales simultaneously on the sketch program. Moreover, DPGen can synthesize sophisticated mechanisms that adaptively process queries until a specified privacy budget is exhausted. When evaluated on standard benchmarks, DPGen is able to generate differentially private mechanisms that optimize simple utility functions within 120 seconds. It is also powerful enough to synthesize adaptive privacy mechanisms.

التشفير والأمن لغات البرمجة

Toward Modern Fortran Tooling and a Thriving Developer Community

142 - Milan Curcic , Ondv{r}ej v{C}ertik , Brad Richardson 2021

Fortran is the oldest high-level programming language that remains in use today and is one of the dominant languages used for compute-intensive scientific and engineering applications. However, Fortran has not kept up with the modern software develop ment practices and tooling in the internet era. As a consequence, the Fortran developer experience has diminished. Specifically, lack of a rich general-purpose library ecosystem, modern tools for building and packaging Fortran libraries and applications, and online learning resources, has made it difficult for Fortran to attract and retain new users. To address this problem, an open source community has formed on GitHub in 2019 and began to work on the initial set of core tools: a standard library, a build system and package manager, and a community-curated website for Fortran. In this paper we report on the progress to date and outline the next steps.

لغات البرمجة أجهزة الكمبيوتر والمجتمع

A Comparison of Code Embeddings and Beyond

167 - Siqi Han , DongXia Wang , Wanting Li 2021

Program representation learning is a fundamental task in software engineering applications. With the availability of big code and the development of deep learning techniques, various program representation learning models have been proposed to unders tand the semantic properties of programs and applied on different software engineering tasks. However, no previous study has comprehensively assessed the generalizability of these deep models on different tasks, so that the pros and cons of the models are unclear. In this experience paper, we try to bridge this gap by systemically evaluating the performance of eight program representation learning models on three common tasks, where six models are based on abstract syntax trees and two models are based on plain text of source code. We kindly explain the criteria for selecting the models and tasks, as well as the method for enabling end-to-end learning in each task. The results of performance evaluation show that they perform diversely in each task and the performance of the AST-based models is generally unstable over different tasks. In order to further explain the results, we apply a prediction attribution technique to find what elements are captured by the models and responsible for the predictions in each task. Based on the findings, we discuss some general principles for better capturing the information in the source code, and hope to inspire researchers to improve program representation learning methods for software engineering tasks.

هندسة البرمجيات لغات البرمجة

The concept of class invariant in object-oriented programming

108 - Bertrand Meyer , Alisa Arkadova , Alexander Kogtenkov 2021

Class invariants -- consistency constraints preserved by every operation on objects of a given type -- are fundamental to building and understanding object-oriented programs. They should also be a key help in verifying them, but turn out instead to r aise major verification challenges which have prompted a significant literature with, until now, no widely accepted solution. The present work introduces a general proof rule meant to address invariant-related issues and allow verification tools benefit from invariants. It first clarifies the notion of invariant and identify the three problems: callbacks, furtive access and reference leak. As an example, the 2016 Ethereum DAO bug, in which $50 million were stolen, resulted from a callback invalidating an invariant. The discussion starts with a Simple Model and an associated proof rule, demonstrating its soundness. It then removes one by one the three assumptions of the Simple Model, each removal bringing up one of the three issues, and introduces the corresponding adaptation to the proof rule. The final version of the rule can tackle tricky examples, including challenge problems listed in the literature.

لغات البرمجة هندسة البرمجيات

Formal Methods for Quantum Programs: A Survey

401 - Christophe Chareton , Sebastien Bardin , Dongho Lee 2021

While recent progress in quantum hardware open the door for significant speedup in certain key areas (cryptography, biology, chemistry, optimization, machine learning, etc), quantum algorithms are still hard to implement right, and the validation of such quantum programs is achallenge. Moreover, importing the testing and debugging practices at use in classical programming is extremely difficult in the quantum case, due to the destructive aspect of quantum measurement. As an alternative strategy, formal methods are prone to play a decisive role in the emerging field of quantum software. Recent works initiate solutions for problems occurring at every stage of the development process: high-level program design, implementation, compilation, etc. We review the induced challenges for an efficient use of formal methods in quantum computing and the current most promising research directions.

لغات البرمجة

Specifying and Testing GPU Workgroup Progress Models

319 - Tyler Sorensen , Lucas F. Salvador , Harmit Raval 2021

As GPU availability has increased and programming support has matured, a wider variety of applications are being ported to these platforms. Many parallel applications contain fine-grained synchronization idioms; as such, their correct execution depen ds on a degree of relative forward progress between threads (or thread groups). Unfortunately, many GPU programming specifications say almost nothing about relative forward progress guarantees between workgroups. Although prior work has proposed a spectrum of plausible progress models for GPUs, cross-vendor specifications have yet to commit to any model. This work is a collection of tools experimental data to aid specification designers when considering forward progress guarantees in programming frameworks. As a foundation, we formalize a small parallel programming language that captures the essence of fine-grained synchronization. We then provide a means of formally specifying a progress model, and develop a termination oracle that decides whether a given program is guaranteed to eventually terminate with respect to a given progress model. Next, we formalize a constraint for concurrent programs that require relative forward progress to terminate. Using this constraint, we synthesize a large set of 483 progress litmus tests. Combined with the termination oracle, this allows us to determine the expected status of each litmus test -- i.e. whether it is guaranteed eventual termination -- under various progress models. We present a large experimental campaign running the litmus tests across 8 GPUs from 5 different vendors. Our results highlight that GPUs have significantly different termination behaviors under our test suite. Most notably, we find that Apple and ARM GPUs do not support the linear occupancy-bound model, an intuitive progress model defined by prior work and hypothesized to describe the workgroup schedulers of existing GPUs.

لغات البرمجة

One Down, 699 to Go: or, synthesising compositional desugarings

230 - Sandor Bartha , James Cheney , Vaishak Belle 2021

Programming or scripting languages used in real-world systems are seldom designed with a formal semantics in mind from the outset. Therefore, developing well-founded analysis tools for these systems requires reverse-engineering a formal semantics as a first step. This can take months or years of effort. Can we (at least partially) automate this process? Though desirable, automatically reverse-engineering semantics rules from an implementation is very challenging, as found by Krishnamurthi et al. [2019]. In this paper, we highlight that scaling methods with the size of the language is very difficult due to state space explosion, so we propose to learn semantics incrementally. We give a formalisation of Krishnamurthi et al.s desugaring learning framework in order to clarify the assumptions necessary for an incremental learning algorithm to be feasible. We show that this reformulation allows us to extend the search space and express rules that Krishnamurthi et al. described as challenging, while still retaining feasibility. We evaluate enumerative synthesis as a baseline algorithm, and demonstrate that, with our reformulation of the problem, it is possible to learn correct desugaring rules for the example source and core languages proposed by Krishnamurthi et al., in most cases identical to the intended rules. In addition, with user guidance, our system was able to synthesize rules for desugaring list comprehensions and try/catch/finally constructs.

لغات البرمجة

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد