DB4HLS: A Database of High-Level Synthesis Design Space Explorations

190 0 0.0 ( 0 )

Download Cite

Added by Lorenzo Ferretti

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Lorenzo Ferretti - Jihye Kwon - Giovanni Ansaloni

Hardware Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

High-Level Synthesis (HLS) frameworks allow to easily specify a large number of variants of the same hardware design by only acting on optimization directives. Nonetheless, the hardware synthesis of implementations for all possible combinations of directive values is impractical even for simple designs. Addressing this shortcoming, many HLS Design Space Exploration (DSE) strategies have been proposed to devise directive settings leading to high-quality implementations while limiting the number of synthesis runs. All these works require considerable efforts to validate the proposed strategies and/or to build the knowledge base employed to tune abstract models, as both tasks mandate the syntheses of large collections of implementations. Currently, such data gathering is performed ad-hoc, a) leading to a lack of standardization, hampering comparisons between DSE alternatives, and b) posing a very high burden to researchers willing to develop novel DSE strategies. Against this backdrop, we here introduce DB4HLS, a database of exhaustive HLS explorations comprising more than 100000 design points collected over 4 years of synthesis time. The open structure of DB4HLS allows the incremental integration of new DSEs, which can be easily defined with a dedicated domain-specific language. We think that of our database, available at https://www.db4hls.inf.usi.ch/, will be a valuable tool for the research community investigating automated strategies for the optimization of HLS-based hardware designs.

rate research

High Level Synthesis with a Dataflow Architectural Template

267 - Shaoyi Cheng , John Wawrzynek 2016

In this work, we present a new approach to high level synthesis (HLS), where high level functions are first mapped to an architectural template, before hardware synthesis is performed. As FPGA platforms are especially suitable for implementing streaming processing pipelines, we perform transformations on conventional high level programs where they are turned into multi-stage dataflow engines [1]. This target template naturally overlaps slow memory data accesses with computations and therefore has much better tolerance towards memory subsystem latency. Using a state-of-the-art HLS tool for the actual circuit generation, we observe up to 9x improvement in overall performance when the dataflow architectural template is used as an intermediate compilation target.

Hardware Architecture

On the Optimization of Behavioral Logic Locking for High-Level Synthesis

97 - Christian Pilato , Luca Collini , Luca Cassano 2021

The globalization of the electronics supply chain is requiring effective methods to thwart reverse engineering and IP theft. Logic locking is a promising solution but there are still several open concerns. Even when applied at high level of abstraction, logic locking leads to large overhead without guaranteeing that the obfuscation metric is actually maximized. We propose a framework to optimize the use of behavioral logic locking for a given security metric. We explore how to apply behavioral logic locking techniques during the HLS of IP cores. Operating on the chip behavior, our method is compatible with commercial HLS tools, complementing existing industrial design flows. We offer a framework where the designer can implement different meta-heuristics to explore the design space and select where to apply logic locking. Our method optimizes a given security metric better than complete obfuscation, allows us to 1) obtain better protection, 2) reduce the obfuscation cost.

Hardware Architecture Cryptography and Security

Latent Space Explorations of Singing Voice Synthesis using DDSP

74 - Juan Alonso , Cumhur Erkut 2021

Machine learning based singing voice models require large datasets and lengthy training times. In this work we present a lightweight architecture, based on the Differentiable Digital Signal Processing (DDSP) library, that is able to output song-like utterances conditioned only on pitch and amplitude, after twelve hours of training using small datasets of unprocessed audio. The results are promising, as both the melody and the singers voice are recognizable. In addition, we present two zero-configuration tools to train new models and experiment with them. Currently we are exploring the latent space representation, which is included in the DDSP library, but not in the original DDSP examples. Our results indicate that the latent space improves both the identification of the singer as well as the comprehension of the lyrics. Our code is available at https://github.com/juanalonso/DDSP-singing-experiments with links to the zero-configuration notebooks, and our sound examples are at https://juanalonso.github.io/DDSP-singing-experiments/ .

Sound Machine Learning Multimedia

Taming Process Variations in CNFET for Efficient Last Level Cache Design

133 - Dawen Xu , Zhuangyu Feng , Cheng Liu 2021

Carbon nanotube field-effect transistors (CNFET) emerge as a promising alternative to CMOS transistors for the much higher speed and energy efficiency, which makes the technology particularly suitable for building the energy-hungry last level cache (LLC). However, the process variations (PVs) in CNFET caused by the imperfect fabrication lead to large timing variation and the worst-case timing dramatically limits the LLC operation speed. Particularly, we observe that the CNFET-based cache latency distribution is closely related to the LLC layouts. For the two typical LLC layouts that have the CNT growth direction aligned to the cache way direction and cache set direction respectively, we proposed variation-aware set aligned (VASA) cache and variation-aware way aligned (VAWA) cache in combination with corresponding cache optimizations such as data shuffling and page mapping to enable low-latency cache for frequently used data. According to our experiments, the optimized LLC reduces the average access latency by 32% and 45% compared to the baseline designs on the two different CNFET layouts respectively while it improves the overall performance by 6% and 9% and reduces the energy consumption by 4% and 8% respectively. In addition, with both the architecture induced latency variation and PV incurred latency variation considered in a unified model, we extended the VAWA and VASA cache design for the CNFET-based NUCA and the proposed NUCA achieves both significant performance improvement and energy saving compared to the straightforward variation-aware NUCA.

Hardware Architecture Emerging Technologies

Third ArchEdge Workshop: Exploring the Design Space of Efficient Deep Neural Networks

115 - Fuxun Yu , Dimitrios Stamoulis , Di Wang 2020

This paper gives an overview of our ongoing work on the design space exploration of efficient deep neural networks (DNNs). Specifically, we cover two aspects: (1) static architecture design efficiency and (2) dynamic model execution efficiency. For static architecture design, different from existing end-to-end hardware modeling assumptions, we conduct full-stack profiling at the GPU core level to identify better accuracy-latency trade-offs for DNN designs. For dynamic model execution, different from prior work that tackles model redundancy at the DNN-channels level, we explore a new dimension of DNN feature map redundancy to be dynamically traversed at runtime. Last, we highlight several open questions that are poised to draw research attention in the next few years.

Hardware Architecture Machine Learning