No Arabic abstract
Heterogeneous programming has started becoming the norm in order to achieve better performance by running portions of code on the most appropriate hardware resource. Currently, significant engineering efforts are undertaken in order to enable existing programming languages to perform heterogeneous execution mainly on GPUs. In this paper we describe Jacc, an experimental framework which allows developers to program GPGPUs directly from Java. By using the Jacc framework, developers have the ability to add GPGPU support into their applications with minimal code refactoring. To simplify the development of GPGPU applications we allow developers to model heterogeneous code using two key abstractions: textit{tasks}, which encapsulate all the information needed to execute code on a GPGPU; and textit{task graphs}, which capture the inter-task control-flow of the application. Using this information the Jacc runtime is able to automatically handle data movement and synchronization between the host and the GPGPU; eliminating the need for explicitly managing disparate memory spaces. In order to generate highly parallel GPGPU code, Jacc provides developers with the ability to decorate key aspects of their code using annotations. The compiler, in turn, exploits this information in order to automatically generate code without requiring additional code refactoring. Finally, we demonstrate the advantages of Jacc, both in terms of programmability and performance, by evaluating it against existing Java frameworks. Experimental results show an average performance speedup of 32x and a 4.4x code decrease across eight evaluated benchmarks on a NVIDIA Tesla K20m GPU.
The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap we introduce C-For-Metal (CM), an explicit SIMD programming framework designed to deliver close-to-the-metal performance on Intel GPUs. The CM programming language and its vector/matrix types provide an intuitive interface to exploit the underlying hardware features, allowing fine-grained register management, SIMD size control and cross-lane data sharing. Experimental results show that CM applications from different domains outperform the best-known SIMT-based OpenCL implementations, achieving up to 2.7x speedup on the latest Intel GPU.
We present FJ&$lambda$, a new core calculus that extends Featherweight Java (FJ) with interfaces, supporting multiple inheritance in a restricted form, $lambda$-expressions, and intersection types. Our main goal is to formalise how lambdas and intersection types are grafted on Java 8, by studying their properties in a formal setting. We show how intersection types play a significant role in several cases, in particular in the typecast of a $lambda$-expression and in the typing of conditional expressions. We also embody interface emph{default methods} in FJ&$lambda$, since they increase the dynamism of $lambda$-expressions, by allowing these methods to be called on $lambda$-expressions. The crucial point in Java 8 and in our calculus is that $lambda$-expressions can have various types according to the context requirements (target types): indeed, Java code does not compile when $lambda$-expressions come without target types. In particular, in the operational semantics we must record target types by decorating $lambda$-expressions, otherwise they would be lost in the runtime expressions. We prove the subject reduction property and progress for the resulting calculus, and we give a type inference algorithm that returns the type of a given program if it is well typed. The design of FJ&$lambda$ has been driven by the aim of making it a subset of Java 8, while preserving the elegance and compactness of FJ. Indeed, FJ&$lambda$ programs are typed and behave the same as Java programs.
Automatic source code analysis in key areas of software engineering, such as code security, can benefit from Machine Learning (ML). However, many standard ML approaches require a numeric representation of data and cannot be applied directly to source code. Thus, to enable ML, we need to embed source code into numeric feature vectors while maintaining the semantics of the code as much as possible. code2vec is a recently released embedding approach that uses the proxy task of method name prediction to map Java methods to feature vectors. However, experimentation with code2vec shows that it learns to rely on variable names for prediction, causing it to be easily fooled by typos or adversarial attacks. Moreover, it is only able to embed individual Java methods and cannot embed an entire collection of methods such as those present in a typical Java class, making it difficult to perform predictions at the class level (e.g., for the identification of malicious Java classes). Both shortcomings are addressed in the research presented in this paper. We investigate the effect of obfuscating variable names during the training of a code2vec model to force it to rely on the structure of the code rather than specific names and consider a simple approach to creating class-level embeddings by aggregating sets of method embeddings. Our results, obtained on a challenging new collection of source-code classification problems, indicate that obfuscating variable names produces an embedding model that is both impervious to variable naming and more accurately reflects code semantics. The datasets, models, and code are shared for further ML research on source code.
A common approach to improve software quality is to use programming guidelines to avoid common kinds of errors. In this paper, we consider the problem of enforcing guidelines for Featherweight Java (FJ). We formalize guidelines as sets of finite or infinite execution traces and develop a region-based type and effect system for FJ that can enforce such guidelines. We build on the work by Erbatur, Hofmann and Zu{a}linescu, who presented a type system for verifying the finite event traces of terminating FJ programs. We refine this type system, separating region typing from FJ typing, and use ideas of Hofmann and Chen to extend it to capture also infinite traces produced by non-terminating programs. Our type and effect system can express properties of both finite and infinite traces and can compute information about the possible infinite traces of FJ programs. Specifically, the set of infinite traces of a method is constructed as the greatest fixed point of the operator which calculates the possible traces of method bodies. Our type inference algorithm is realized by working with the finitary abstraction of the system based on Buchi automata.
Java is the go-to programming language choice for developing scalable enterprise cloud applications. In such systems, even a few percent CPU time savings can offer a significant competitive advantage and cost saving. Although performance tools abound in Java, those that focus on the data locality in the memory hierarchy are rare. In this paper, we present DJXPerf, a lightweight, object-centric memory profiler for Java, which associates memory-hierarchy performance metrics (e.g., cache/TLB misses) with Java objects. DJXPerf uses statistical sampling of hardware performance monitoring counters to attribute metrics to not only source code locations but also Java objects. DJXPerf presents Java object allocation contexts combined with their usage contexts and presents them ordered by the poor locality behaviors. DJXPerfs performance measurement, object attribution, and presentation techniques guide optimizing object allocation, layout, and access patterns. DJXPerf incurs only ~8% runtime overhead and ~5% memory overhead on average, requiring no modifications to hardware, OS, Java virtual machine, or application source code, which makes it attractive to use in production. Guided by DJXPerf, we study and optimize a number of Java and Scala programs, including well-known benchmarks and real-world applications, and demonstrate significant speedups.