Standard Cell Library Design and Optimization Methodology for ASAP7 PDK

103 0 0.0 ( 0 )

Download Cite

Added by Xiaoqing Xu

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Xiaoqing Xu - Nishi Shah - Andrew Evans

Hardware Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Standard cell libraries are the foundation for the entire backend design and optimization flow in modern application-specific integrated circuit designs. At 7nm technology node and beyond, standard cell library design and optimization is becoming increasingly difficult due to extremely complex design constraints, as described in the ASAP7 process design kit (PDK). Notable complexities include discrete transistor sizing due to FinFETs, complicated design rules from lithography and restrictive layout space from modern standard cell architectures. The design methodology presented in this paper enables efficient and high-quality standard cell library design and optimization with the ASAP7 PDK. The key techniques include exhaustive transistor sizing for cell timing optimization, transistor placement with generalized Euler paths and back-end design prototyping for library-level explorations.

rate research

Methodology for standard cell compliance and detailed placement for triple patterning lithography

522 - Bei Yu , Xiaoqing Xu , Jhih-Rong Gao 2014

As the feature size of semiconductor process further scales to sub-16nm technology node, triple patterning lithography (TPL) has been regarded one of the most promising lithography candidates. M1 and contact layers, which are usually deployed within standard cells, are most critical and complex parts for modern digital designs. Traditional design flow that ignores TPL in early stages may limit the potential to resolve all the TPL conflicts. In this paper, we propose a coherent framework, including standard cell compliance and detailed placement to enable TPL friendly design. Considering TPL constraints during early design stages, such as standard cell compliance, improves the layout decomposability. With the pre-coloring solutions of standard cells, we present a TPL aware detailed placement, where the layout decomposition and placement can be resolved simultaneously. Our experimental results show that, with negligible impact on critical path delay, our framework can resolve the conflicts much more easily, compared with the traditional physical design flow and followed layout decomposition.

Hardware Architecture

Self-Aligned Double Patterning Friendly Configuration for Standard Cell Library Considering Placement

372 - Jhih-Rong Gao , Bei Yu , Ru Huang 2014

Self-aligned double patterning (SADP) has become a promising technique to push pattern resolution limit to sub-22nm technology node. Although SADP provides good overlay controllability, it encounters many challenges in physical design stages to obtain conflict-free layout decomposition. In this paper, we study the impact on placement by different standard cell layout decomposition strategies. We propose a SADP friendly standard cell configuration which provides pre-coloring results for standard cells. These configurations are brought into the placement stage to help ensure layout decomposability and save the extra effort for solving conflicts in later stages.

Hardware Architecture

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

74 - Geraldo F. Oliveira , Juan Gomez-Luna , Lois Orosa 2021

Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques to more memory-centric techniques, thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.

Hardware Architecture Distributed Parallel and Cluster Computing Performance

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design

326 - Cong Hao , Jordan Dotzel , Jinjun Xiong 2021

Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in peoples lives. Empowered by edge computing, AI workloads are migrating from centralized cloud architectures to distributed edge systems, introducing a new paradigm called edge AI. While edge AI has the promise of bringing significant increases in autonomy and intelligence into everyday lives through common edge devices, it also raises new challenges, especially for the development of its algorithms and the deployment of its services, which call for novel design methodologies catered to these unique challenges. In this paper, we provide a comprehensive survey of the latest enabling design methodologies that span the entire edge AI development stack. We suggest that the key methodologies for effective edge AI development are single-layer specialization and cross-layer co-design. We discuss representative methodologies in each category in detail, including on-device training methods, specialized software design, dedicated hardware design, benchmarking and design automation, software/hardware co-design, software/compiler co-design, and compiler/hardware co-design. Moreover, we attempt to reveal hidden cross-layer design opportunities that can further boost the solution quality of future edge AI and provide insights into future directions and emerging areas that require increased research focus.

Hardware Architecture Artificial Intelligence

Performance Analysis and Optimization Opportunities for NVIDIA Automotive GPUs

61 - Hamid Tabani 2021

Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD) bring unprecedented performance requirements for automotive systems. Graphic Processing Unit (GPU) based platforms have been deployed with the aim of meeting these requirements, being NVIDIA Jetson TX2 and its high-performance successor, NVIDIA AGX Xavier, relevant representatives. However, to what extent high-performance GPU configurations are appropriate for ADAS and AD workloads remains as an open question. This paper analyzes this concern and provides valuable insights on this question by modeling two recent automotive NVIDIA GPU-based platforms, namely TX2 and AGX Xavier. In particular, our work assesses their microarchitectural parameters against relevant benchmarks, identifying GPU setups delivering increased performance within a similar cost envelope, or decreasing hardware costs while preserving original performance levels. Overall, our analysis identifies opportunities for the optimization of automotive GPUs to further increase system efficiency.

Hardware Architecture