No Arabic abstract
R is a popular language and programming environment for data scientists. It is increasingly co-packaged with both relational and Hadoop-based data platforms and can often be the most dominant computational component in data analytics pipelines. Recent work has highlighted inefficiencies in executing R programs, both in terms of execution time and memory requirements, which in practice limit the size of data that can be analyzed by R. This paper presents ROSA, a static analysis framework to improve the performance and space efficiency of R programs. ROSA analyzes input programs to determine program properties such as reaching definitions, live variables, aliased variables, and types of variables. These inferred properties enable program transformations such as C++ code translation, strength reduction, vectorization, code motion, in addition to interpretive optimizations such as avoiding redundant object copies and performing in-place evaluations. An empirical evaluation shows substantial reductions by ROSA in execution time and memory consumption over both CRAN R and Microsoft R Open.
This document describes how to use the XML static analyzer in practice. It provides informal documentation for using the XML reasoning solver implementation. The solver allows automated verification of properties that are expressed as logical formulas over trees. A logical formula may for instance express structural constraints or navigation properties (like e.g. path existence and node selection) in finite trees. Logical formulas can be expressed using the syntax of XPath expressions, DTD, XML Schemas, and Relax NG definitions.
Recently, the iterative approach named linear tabling has received considerable attention because of its simplicity, ease of implementation, and good space efficiency. Linear tabling is a framework from which different methods can be derived based on the strategies used in handling looping subgoals. One decision concerns when answers are consumed and returned. This paper describes two strategies, namely, {it lazy} and {it eager} strategies, and compares them both qualitatively and quantitatively. The results indicate that, while the lazy strategy has good locality and is well suited for finding all solutions, the eager strategy is comparable in speed with the lazy strategy and is well suited for programs with cuts. Linear tabling relies on depth-first iterative deepening rather than suspension to compute fixpoints. Each cluster of inter-dependent subgoals as represented by a top-most looping subgoal is iteratively evaluated until no subgoal in it can produce any new answers. Naive re-evaluation of all looping subgoals, albeit simple, may be computationally unacceptable. In this paper, we also introduce semi-naive optimization, an effective technique employed in bottom-up evaluation of logic programs to avoid redundant joins of answers, into linear tabling. We give the conditions for the technique to be safe (i.e. sound and complete) and propose an optimization technique called {it early answer promotion} to enhance its effectiveness. Benchmarking in B-Prolog demonstrates that with this optimization linear tabling compares favorably well in speed with the state-of-the-art implementation of SLG.
Researchers have developed various techniques for static analysis of JavaScript to improve analysis precision. To develop such techniques, they first identify causes of the precision losses for unproven properties. While most of the existing work has diagnosed main causes of imprecision in static analysis by manual investigation, manually tracing the imprecision causes is challenging because it requires detailed knowledge of analyzer internals. Recently, several studies proposed to localize the analysis imprecision causes automatically, but these localization techniques work for only specific analysis techniques. In this paper, we present an automatic technique that can trace analysis imprecision causes of JavaScript applications starting from user-selected variables. Given a set of program variables, our technique stops an analysis when any of the variables gets imprecise analysis values. It then traces the imprecise analysis values using intermediate analysis results back to program points where the imprecision first started. Our technique shows the trace information with a new representation called tracing graphs, whose nodes and edges together represent traces from imprecise points to precise points. In order to detect major causes of analysis imprecision automatically, we present four node/edge patterns in tracing graphs for common imprecision causes. We formalized the technique of generating tracing graphs and identifying patterns, and implemented them on SAFE, a state-of-the-art JavaScript static analyzer with various analysis configurations, such as context-sensitivity, loop-sensitivity, and heap cloning. Our evaluation demonstrates that the technique can easily find 96 % of the major causes of the imprecision problems in 17 applications by only automatic detection in tracing graphs using the patterns, and selectively adopting various advanced techniques can eliminate the found causes of imprecision.
Datalog has become a popular language for writing static analyses. Because Datalog is very limited, some implementations of Datalog for static analysis have extended it with new language features. However, even with these features it is hard or impossible to express a large class of analyses because they use logical formulae to represent program state. FormuLog fills this gap by extending Datalog to represent, manipulate, and reason about logical formulae. We have used FormuLog to implement declarati
Satisfiability modulo theories (SMT) solving has become a critical part of many static analyses, including symbolic execution, refinement type checking, and model checking. We propose Formulog, a domain-specific language that makes it possible to write a range of SMT-based static analyses in a way that is both close to their formal specifications and amenable to high-level optimizations and efficient evaluation. Formulog extends the logic programming language Datalog with a first-order functional language and mechanisms for representing and reasoning about SMT formulas; a novel type system supports the construction of expressive formulas, while ensuring that neither normal evaluation nor SMT solving goes wrong. Our case studies demonstrate that a range of SMT-based analyses can naturally and concisely be encoded in Formulog, and that -- thanks to this encoding -- high-level Datalog-style optimizations can be automatically and advantageously applied to these analyses.