No Arabic abstract
Simplicity is a typed, combinator-based, functional language without loops and recursion, designed to be used for crypto-currencies and blockchain applications. It aims to improve upon existing crypto-currency languages, such as Bitcoin Script and Ethereums EVM, while avoiding some of the problems they face. Simplicity comes with formal denotational semantics defined in Coq, a popular, general purpose software proof assistant. Simplicity also includes operational semantics that are defined with an abstract machine that we call the Bit Machine. The Bit Machine is used as a tool for measuring the computational space and time resources needed to evaluate Simplicity programs. Owing to its Turing incompleteness, Simplicity is amenable to static analysis that can be used to derive upper bounds on the computational resources needed, prior to execution. While Turing incomplete, Simplicity can express any finitary function, which we believe is enough to build useful smart contracts for blockchain applications.
User-defined syntax extensions are useful to implement an embedded domain specific language (EDSL) with good code-readability. They allow EDSL authors to define domain-natural notation, which is often different from the host language syntax. Nowadays, there are several research works of powerful user-defined syntax extensions. One promising approach uses user-defined operators. A user-defined operator is a function with user-defined syntax. It can be regarded as a syntax extension implemented without macros. An advantage of user-defined operators is that an operator can be statically typed. The compiler can find type errors in the definition of an operator before the operator is used. In addition, the compiler can resolve syntactic ambiguities by using static types. However, user-defined operators are difficult to implement language constructs involving static name binding. Name binding is association between names and values (or memory locations). Our inquiry is whether we can design a system for user-defined operators involving a new custom name binding. This paper proposes a module system for user-defined operators named a dsl class. A dsl class is similar to a normal class in Java but it contains operators instead of methods. We use operators for implementing custom name binding. For example, we use a nullary operator for emulating a variable name. An instance of a dsl class, called a dsl object, reifies an environment that expresses name binding. Programmers can control a scope of instance operators by specifying where the dsl object is active. We extend the host type system so that it can express the activation of a dsl object. In our system, a bound name is propagated through a type parameter to a dsl object. This enables us to implement user-defined language constructs involving static name binding. A contribution of this paper is that we reveal we can integrate a system for managing names and their scopes with a module and type system of an object-oriented language like Java. This allows us to implement a proposed system by adopting eager disambiguation based on expected types so that the compilation time will be acceptable. Eager disambiguation, which prunes out semantically invalid abstract parsing trees (ASTs) while a parser is running, is needed because the parser may generate a huge number of potentially valid ASTs for the same source code. We have implemented ProteaJ2, which is a programming language based on Java and it supports our proposal. We describe a parsing method that adopts eager disambiguation for fast parsing and discuss its time complexity. To show the practicality of our proposal, we have conducted two micro benchmarks to see the performance of our compiler. We also show several use cases of dsl classes for demonstrating dsl classes can express various language constructs. Our ultimate goal is to let programmers add any kind of new language construct to a host language. To do this, programmers should be able to define new syntax, name binding, and type system within the host language. This paper shows programmers can define the former two: their own syntax and name binding.
This paper describes the design and implementation of CRAQL (Composable Repository Analysis and Query Language), a new query language for source code. The growth of source code mining and its applications suggest the need for a query language that can fully utilize and correlate across the unique structure and metadata of parsed source code. CRAQL is built on an underlying abstraction analogous to the underpinnings of SQL, but aimed at parsed source code. Thus, while SQL queries inputs and outputs are sets of tuples, CRAQL queries inputs and outputs are sets of abstract syntax trees (ASTs). This abstraction makes CRAQL queries composable (the output of one query can become the input to another) and improves the power of the language by allowing for querying of the tree structure and metadata, as well as raw text. Furthermore, the abstraction enables tree-specific language optimizations and allows CRAQL to be easily applied to any language that is parsable into ASTs. These attributes, along with a familiar syntax similar to SQL, allow complex queries to be expressed in a compact, straightforward manner. Questions such as find the longest series of statements without any loops, find methods that are never called, find getters (0-parameter methods with a single statement that returns a member variable), or find the percentage of variables declared at the top of a block all translate to simple, understandable queries in CRAQL. In this paper we describe the language, its features and capabilities. We compare CRAQL to other languages for querying source code and find that it has potential advantages in clarity and compactness. We discuss the features and optimizations added to support searching parse tree collections effectively and efficiently. Finally, we summarize the application of the language to millions of Java source files, the details of which are in a companion paper.
We present a self-certifying compiler for the COGENT systems language. COGENT is a restricted, polymorphic, higher-order, and purely functional language with linear types and without the need for a trusted runtime or garbage collector. It compiles to efficient C code that is designed to interoperate with existing C functions. The language is suited for layered systems code with minimal sharing such as file systems or network protocol control code. For a well-typed COGENT program, the compiler produces C code, a high-level shallow embedding of its semantics in Isabelle/HOL, and a proof that the C code correctly implements this embedding. The aim is for proof engineers to reason about the full semantics of real-world systems code productively and equationally, while retaining the interoperability and leanness of C. We describe the formal verification stages of the compiler, which include automated formal refinement calculi, a switch from imperative update semantics to functional value semantics formally justified by the linear type system, and a number of standard compiler phases such as type checking and monomorphisation. The compiler certificate is a series of language-level meta proofs and per-program translation validation phases, combined into one coherent top-level theorem in Isabelle/HOL.
Instead of a monolithic programming language trying to cover all features of interest, some programming systems are designed by combining together simpler languages that cooperate to cover the same feature space. This can improve usability by making each part simpler than the whole, but there is a risk of abstraction leaks from one language to another that would break expectations of the users familiar with only one or some of the involved languages. We propose a formal specification for what it means for a given language in a multi-language system to be usable without leaks: it should embed into the multi-language in a fully abstract way, that is, its contextual equivalence should be unchanged in the larger system. To demonstrate our proposed design principle and formal specification criterion, we design a multi-language programming system that combines an ML-like statically typed functional language and another language with linear types and linear state. Our goal is to cover a good part of the expressiveness of languages that mix functional programming and linear state (ownership), at only a fraction of the complexity. We prove that the embedding of ML into the multi-language system is fully abstract: functional programmers should not fear abstraction leaks. We show examples of combined programs demonstrating in-place memory updates and safe resource handling, and an implementation extending OCaml with our linear language.
How does one compile derivatives of tensor programs, such that the resulting code is purely functional (hence easier to optimize and parallelize) and provably efficient relative to the original program? We show that naively differentiating tensor code---as done in popular systems like Tensorflow and PyTorch---can cause asymptotic slowdowns in pathological cases, violating the Cheap Gradients Principle. However, all existing automatic differentiation methods that guarantee this principle (for variable size data) do so by relying on += mutation through aliases/pointers---which complicates downstream optimization. We provide the first purely functional, provably efficient, adjoint/reverse-mode derivatives of array/tensor code by explicitly accounting for sparsity. We do this by focusing on the indicator function from Iversons APL. We also introduce a new Tensor SSA normal form and a new derivation of reverse-mode automatic differentiation based on the universal property of inner-products.