No Arabic abstract
Java is the go-to programming language choice for developing scalable enterprise cloud applications. In such systems, even a few percent CPU time savings can offer a significant competitive advantage and cost saving. Although performance tools abound in Java, those that focus on the data locality in the memory hierarchy are rare. In this paper, we present DJXPerf, a lightweight, object-centric memory profiler for Java, which associates memory-hierarchy performance metrics (e.g., cache/TLB misses) with Java objects. DJXPerf uses statistical sampling of hardware performance monitoring counters to attribute metrics to not only source code locations but also Java objects. DJXPerf presents Java object allocation contexts combined with their usage contexts and presents them ordered by the poor locality behaviors. DJXPerfs performance measurement, object attribution, and presentation techniques guide optimizing object allocation, layout, and access patterns. DJXPerf incurs only ~8% runtime overhead and ~5% memory overhead on average, requiring no modifications to hardware, OS, Java virtual machine, or application source code, which makes it attractive to use in production. Guided by DJXPerf, we study and optimize a number of Java and Scala programs, including well-known benchmarks and real-world applications, and demonstrate significant speedups.
Parallel applications are extremely challenging to achieve the optimal performance on the NUMA architecture, which necessitates the assistance of profiling tools. However, existing NUMA-profiling tools share some similar shortcomings, such as portability, effectiveness, and helpfulness issues. This paper proposes a novel profiling tool - NumaPerf - that overcomes these issues. NumaPerf aims to identify potential performance issues for any NUMA architecture, instead of only on the current hardware. To achieve this, NumaPerf focuses on memory sharing patterns between threads, instead of real remote accesses. NumaPerf further detects potential thread migrations and load imbalance issues that could significantly affect the performance but are omitted by existing profilers. NumaPerf also separates cache coherence issues that may require different fix strategies. Based on our extensive evaluation, NumaPerf is able to identify more performance issues than any existing tool, while fixing these bugs leads to up to 5.94x performance speedup.
Heterogeneous programming has started becoming the norm in order to achieve better performance by running portions of code on the most appropriate hardware resource. Currently, significant engineering efforts are undertaken in order to enable existing programming languages to perform heterogeneous execution mainly on GPUs. In this paper we describe Jacc, an experimental framework which allows developers to program GPGPUs directly from Java. By using the Jacc framework, developers have the ability to add GPGPU support into their applications with minimal code refactoring. To simplify the development of GPGPU applications we allow developers to model heterogeneous code using two key abstractions: textit{tasks}, which encapsulate all the information needed to execute code on a GPGPU; and textit{task graphs}, which capture the inter-task control-flow of the application. Using this information the Jacc runtime is able to automatically handle data movement and synchronization between the host and the GPGPU; eliminating the need for explicitly managing disparate memory spaces. In order to generate highly parallel GPGPU code, Jacc provides developers with the ability to decorate key aspects of their code using annotations. The compiler, in turn, exploits this information in order to automatically generate code without requiring additional code refactoring. Finally, we demonstrate the advantages of Jacc, both in terms of programmability and performance, by evaluating it against existing Java frameworks. Experimental results show an average performance speedup of 32x and a 4.4x code decrease across eight evaluated benchmarks on a NVIDIA Tesla K20m GPU.
We present FJ&$lambda$, a new core calculus that extends Featherweight Java (FJ) with interfaces, supporting multiple inheritance in a restricted form, $lambda$-expressions, and intersection types. Our main goal is to formalise how lambdas and intersection types are grafted on Java 8, by studying their properties in a formal setting. We show how intersection types play a significant role in several cases, in particular in the typecast of a $lambda$-expression and in the typing of conditional expressions. We also embody interface emph{default methods} in FJ&$lambda$, since they increase the dynamism of $lambda$-expressions, by allowing these methods to be called on $lambda$-expressions. The crucial point in Java 8 and in our calculus is that $lambda$-expressions can have various types according to the context requirements (target types): indeed, Java code does not compile when $lambda$-expressions come without target types. In particular, in the operational semantics we must record target types by decorating $lambda$-expressions, otherwise they would be lost in the runtime expressions. We prove the subject reduction property and progress for the resulting calculus, and we give a type inference algorithm that returns the type of a given program if it is well typed. The design of FJ&$lambda$ has been driven by the aim of making it a subset of Java 8, while preserving the elegance and compactness of FJ. Indeed, FJ&$lambda$ programs are typed and behave the same as Java programs.
A common approach to improve software quality is to use programming guidelines to avoid common kinds of errors. In this paper, we consider the problem of enforcing guidelines for Featherweight Java (FJ). We formalize guidelines as sets of finite or infinite execution traces and develop a region-based type and effect system for FJ that can enforce such guidelines. We build on the work by Erbatur, Hofmann and Zu{a}linescu, who presented a type system for verifying the finite event traces of terminating FJ programs. We refine this type system, separating region typing from FJ typing, and use ideas of Hofmann and Chen to extend it to capture also infinite traces produced by non-terminating programs. Our type and effect system can express properties of both finite and infinite traces and can compute information about the possible infinite traces of FJ programs. Specifically, the set of infinite traces of a method is constructed as the greatest fixed point of the operator which calculates the possible traces of method bodies. Our type inference algorithm is realized by working with the finitary abstraction of the system based on Buchi automata.
Program transformation has gained a wide interest since it is used for several purposes: altering semantics of a program, adding features to a program or performing optimizations. In this paper we focus on program transformations at the bytecode level. Because these transformations may introduce errors, our goal is to provide a formal way to verify the update and establish its correctness. The formal framework presented includes a definition of a formal semantics of updates which is the base of a static verification and a scheme based on Hoare triples and weakest precondition calculus to reason about behavioral aspects in bytecode transformation