No Arabic abstract
High-level synthesis (HLS) is a key component for the hardware acceleration of applications, especially thanks to the diffusion of reconfigurable devices in many domains, from data centers to edge devices. HLS reduces development times by allowing designers to raise the abstraction level and use automated methods for hardware generation. Since security concerns are becoming more and more relevant for data-intensive applications, we investigate how to abstract security properties and use HLS for their integration with the accelerator functionality. We use the case of dynamic information flow tracking, showing how classic software-level abstractions can be efficiently used to hide implementation details to the designers.
High-level Synthesis (HLS) has been widely adopted as it significantly improves the hardware design productivity and enables efficient design space exploration (DSE). HLS tools can be used to deliver solutions for many different kinds of design problems, which are often better solved with different levels of abstraction. While existing HLS tools are built using compiler infrastructures largely based on a single-level abstraction (e.g., LLVM), we propose ScaleHLS, a next-generation HLS compilation flow, on top of a multi-level compiler infrastructure called MLIR, for the first time. By using an intermediate representation (IR) that can be better tuned to particular algorithms at different representation levels, we are able to build this new HLS tool that is more scalable and customizable towards various applications coming with intrinsic structural or functional hierarchies. ScaleHLS is able to represent and optimize HLS designs at multiple levels of abstraction and provides an HLS-dedicated transform and analysis library to solve the optimization problems at the suitable representation levels. On top of the library, we also build an automated DSE engine to explore the multi-dimensional design space efficiently. In addition, we develop an HLS C front-end and a C/C++ emission back-end to translate HLS designs into/from MLIR for enabling the end-to-end ScaleHLS flow. Experimental results show that, comparing to the baseline designs only optimized by Xilinx Vivado HLS, ScaleHLS improves the performances with amazing quality-of-results -- up to 768.1x better on computation kernel level programs and up to 3825.0x better on neural network models.
Limited by the small keyboard, most mobile apps support the automatic login feature for better user experience. Therefore, users avoid the inconvenience of retyping their ID and password when an app runs in the foreground again. However, this auto-login function can be exploited to launch the so-called data-clone attack: once the locally-stored, auto-login depended data are cloned by attackers and placed into their own smartphones, attackers can break through the login-device number limit and log in to the victims account stealthily. A natural countermeasure is to check the consistency of devicespecific attributes. As long as the new device shows different device fingerprints with the previous one, the app will disable the auto-login function and thus prevent data-clone attacks. In this paper, we develop VPDroid, a transparent Android OS-level virtualization platform tailored for security testing. With VPDroid, security analysts can customize different device artifacts, such as CPU model, Android ID, and phone number, in a virtual phone without user-level API hooking. VPDroids isolation mechanism ensures that user-mode apps in the virtual phone cannot detect device-specific discrepancies. To assess Android apps susceptibility to the data-clone attack, we use VPDroid to simulate data-clone attacks with 234 most-downloaded apps. Our experiments on five different virtual phone environments show that VPDroids device attribute customization can deceive all tested apps that perform device-consistency checks, such as Twitter, WeChat, and PayPal. 19 vendors have confirmed our report as a zero-day vulnerability. Our findings paint a cautionary tale: only enforcing a device-consistency check at client side is still vulnerable to an advanced data-clone attack.
This paper presents a high-level circuit obfuscation technique to prevent the theft of intellectual property (IP) of integrated circuits. In particular, our technique protects a class of circuits that relies on constant multiplications, such as filters and neural networks, where the constants themselves are the IP to be protected. By making use of decoy constants and a key-based scheme, a reverse engineer adversary at an untrusted foundry is rendered incapable of discerning true constants from decoy constants. The time-multiplexed constant multiplication (TMCM) block of such circuits, which realizes the multiplication of an input variable by a constant at a time, is considered as our case study for obfuscation. Furthermore, two TMCM design architectures are taken into account; an implementation using a multiplier and a multiplierless shift-adds implementation. Optimization methods are also applied to reduce the hardware complexity of these architectures. The well-known satisfiability (SAT) and automatic test pattern generation (ATPG) attacks are used to determine the vulnerability of the obfuscated designs. It is observed that the proposed technique incurs small overheads in area, power, and delay that are comparable to the hardware complexity of prominent logic locking methods. Yet, the advantage of our approach is in the insight that constants -- instead of arbitrary circuit nodes -- become key-protected.
GRAVITY is the four-beam, near- infrared, AO-assisted, fringe tracking, astrometric and imaging instrument for the Very Large Telescope Interferometer (VLTI). It is requiring the development of one of the most complex instrument software systems ever built for an ESO instrument. Apart from its many interfaces and interdependencies, one of the most challenging aspects is the overall performance and stability of this complex system. The three infrared detectors and the fast reflective memory network (RMN) recorder contribute a total data rate of up to 20 MiB/s accumulating to a maximum of 250 GiB of data per night. The detectors, the two instrument Local Control Units (LCUs) as well as the five LCUs running applications under TAC (Tools for Advanced Control) architecture, are interconnected with fast Ethernet, RMN fibers and dedicated fiber connections as well as signals for the time synchronization. Here we give a simplified overview of all subsystems of GRAVITY and their interfaces and discuss two examples of high-level applications during observations: the acquisition procedure and the gathering and merging of data to the final FITS file.
Inversion and PDE-constrained optimization problems often rely on solving the adjoint problem to calculate the gradient of the objec- tive function. This requires storing large amounts of intermediate data, setting a limit to the largest problem that might be solved with a given amount of memory available. Checkpointing is an approach that can reduce the amount of memory required by redoing parts of the computation instead of storing intermediate results. The Revolve checkpointing algorithm o ers an optimal schedule that trades computational cost for smaller memory footprints. Integrat- ing Revolve into a modern python HPC code and combining it with code generation is not straightforward. We present an API that makes checkpointing accessible from a DSL-based code generation environment along with some initial performance gures with a focus on seismic applications.