ترغب بنشر مسار تعليمي؟ اضغط هنا

CRAQL: A Composable Language for Querying Source Code

140   0   0.0 ( 0 )
 نشر من قبل Blake Johnson
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper describes the design and implementation of CRAQL (Composable Repository Analysis and Query Language), a new query language for source code. The growth of source code mining and its applications suggest the need for a query language that can fully utilize and correlate across the unique structure and metadata of parsed source code. CRAQL is built on an underlying abstraction analogous to the underpinnings of SQL, but aimed at parsed source code. Thus, while SQL queries inputs and outputs are sets of tuples, CRAQL queries inputs and outputs are sets of abstract syntax trees (ASTs). This abstraction makes CRAQL queries composable (the output of one query can become the input to another) and improves the power of the language by allowing for querying of the tree structure and metadata, as well as raw text. Furthermore, the abstraction enables tree-specific language optimizations and allows CRAQL to be easily applied to any language that is parsable into ASTs. These attributes, along with a familiar syntax similar to SQL, allow complex queries to be expressed in a compact, straightforward manner. Questions such as find the longest series of statements without any loops, find methods that are never called, find getters (0-parameter methods with a single statement that returns a member variable), or find the percentage of variables declared at the top of a block all translate to simple, understandable queries in CRAQL. In this paper we describe the language, its features and capabilities. We compare CRAQL to other languages for querying source code and find that it has potential advantages in clarity and compactness. We discuss the features and optimizations added to support searching parse tree collections effectively and efficiently. Finally, we summarize the application of the language to millions of Java source files, the details of which are in a companion paper.

قيم البحث

اقرأ أيضاً

139 - Denis Barthou 2012
Quantum chromodynamics (QCD) is the theory of subnuclear physics, aiming at mod- eling the strong nuclear force, which is responsible for the interactions of nuclear particles. Lattice QCD (LQCD) is the corresponding discrete formulation, widely used for simula- tions. The computational demand for the LQCD is tremendous. It has played a role in the history of supercomputers, and has also helped defining their future. Designing efficient LQCD codes that scale well on large (probably hybrid) supercomputers requires to express many levels of parallelism, and then to explore different algorithmic solutions. While al- gorithmic exploration is the key for efficient parallel codes, the process is hampered by the necessary coding effort. We present in this paper a domain-specific language, QIRAL, for a high level expression of parallel algorithms in LQCD. Parallelism is expressed through the mathematical struc- ture of the sparse matrices defining the problem. We show that from these expressions and from algorithmic and preconditioning formulations, a parallel code can be automatically generated. This separates algorithms and mathematical formulations for LQCD (that be- long to the field of physics) from the effective orchestration of parallelism, mainly related to compilation and optimization for parallel architectures.
Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.
367 - Russell OConnor 2017
Simplicity is a typed, combinator-based, functional language without loops and recursion, designed to be used for crypto-currencies and blockchain applications. It aims to improve upon existing crypto-currency languages, such as Bitcoin Script and Et hereums EVM, while avoiding some of the problems they face. Simplicity comes with formal denotational semantics defined in Coq, a popular, general purpose software proof assistant. Simplicity also includes operational semantics that are defined with an abstract machine that we call the Bit Machine. The Bit Machine is used as a tool for measuring the computational space and time resources needed to evaluate Simplicity programs. Owing to its Turing incompleteness, Simplicity is amenable to static analysis that can be used to derive upper bounds on the computational resources needed, prior to execution. While Turing incomplete, Simplicity can express any finitary function, which we believe is enough to build useful smart contracts for blockchain applications.
We present a self-certifying compiler for the COGENT systems language. COGENT is a restricted, polymorphic, higher-order, and purely functional language with linear types and without the need for a trusted runtime or garbage collector. It compiles to efficient C code that is designed to interoperate with existing C functions. The language is suited for layered systems code with minimal sharing such as file systems or network protocol control code. For a well-typed COGENT program, the compiler produces C code, a high-level shallow embedding of its semantics in Isabelle/HOL, and a proof that the C code correctly implements this embedding. The aim is for proof engineers to reason about the full semantics of real-world systems code productively and equationally, while retaining the interoperability and leanness of C. We describe the formal verification stages of the compiler, which include automated formal refinement calculi, a switch from imperative update semantics to functional value semantics formally justified by the linear type system, and a number of standard compiler phases such as type checking and monomorphisation. The compiler certificate is a series of language-level meta proofs and per-program translation validation phases, combined into one coherent top-level theorem in Isabelle/HOL.
Instead of a monolithic programming language trying to cover all features of interest, some programming systems are designed by combining together simpler languages that cooperate to cover the same feature space. This can improve usability by making each part simpler than the whole, but there is a risk of abstraction leaks from one language to another that would break expectations of the users familiar with only one or some of the involved languages. We propose a formal specification for what it means for a given language in a multi-language system to be usable without leaks: it should embed into the multi-language in a fully abstract way, that is, its contextual equivalence should be unchanged in the larger system. To demonstrate our proposed design principle and formal specification criterion, we design a multi-language programming system that combines an ML-like statically typed functional language and another language with linear types and linear state. Our goal is to cover a good part of the expressiveness of languages that mix functional programming and linear state (ownership), at only a fraction of the complexity. We prove that the embedding of ML into the multi-language system is fully abstract: functional programmers should not fear abstraction leaks. We show examples of combined programs demonstrating in-place memory updates and safe resource handling, and an implementation extending OCaml with our linear language.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا