No Arabic abstract
Synchoros VLSI design style has been proposed as an alternative to standard cell-based design. Standard cells are replaced by synchoros large grain VLSI design objects called SiLago blocks. This new design style enables end-to-end automation of large scale designs by abutting the SiLago blocks to eliminate logic and physical synthesis for the end-users. A key problem in this automation process is the generation of regional clock tree. Synchoros design style requires that the clock tree should emerge by abutting its fragments. The clock tree fragments are absorbed in the SiLago blocks as a one-time engineering effort. The clock tree should not be ad-hoc, but a structured and predictable design whose cost metrics are known. Here, we present a new clock tree design that is compatible with the synchoros design style. The proposed design has been verified with static timing analysis and compared against functionally equivalent clock tree synthesised by the commercial EDA tools. The scheme is scalable and, in principle, can generate arbitrarily complex designs. In this paper, we show as a proof of concept that a regional clock tree can be created by abutment. We prove that with the help of the generated clock tree, it is possible to generate valid VLSI designs from 0.5 to ~2 million gates. The resulting generated designs do not need a separate regional clock tree synthesis. More critically, the synthesised design is correct by construction and requires no further verification. In contrast, the state-of-the-art hierarchical synthesis flow requires synthesis of the regional clock tree. Additionally, the conventional clock tree and its design needs a verification step because it lacks predictability. The results also demonstrate that the capacitance, slew and the ability to balance skew of the clock tree created by abutment is comparable to the one generated by commercial EDA tools.
With continued feature size scaling, even state of the art semiconductor manufacturing processes will often run into layouts with poor printability and yield. Identifying lithography hotspots is important at both physical verification and early physical design stages. While detailed lithography simulations can be very accurate, they may be too computationally expensive for full-chip scale and physical design inner loops. Meanwhile, pattern matching and machine learning based hotspot detection methods can provide acceptable quality and yet fast turn-around-time for full-chip scale physical verification and design. In this paper, we discuss some key issues and recent results on lithography hotspot detection and mitigation in nanometer VLSI.
Customization of processor architectures through Instruction Set Extensions (ISEs) is an effective way to meet the growing performance demands of embedded applications. A high-quality ISE generation approach needs to obtain results close to those achieved by experienced designers, particularly for complex applications that exhibit regularity: expert designers are able to exploit manually such regularity in the data flow graphs to generate high-quality ISEs. In this paper, we present ISEGEN, an approach that identifies high-quality ISEs by iterative improvement following the basic principles of the well-known Kernighan-Lin (K-L) min-cut heuristic. Experimental results on a number of MediaBench, EEMBC and cryptographic applications show that our approach matches the quality of the optimal solution obtained by exhaustive search. We also show that our ISEGEN technique is on average 20x faster than a genetic formulation that generates equivalent solutions. Furthermore, the ISEs identified by our technique exhibit 35% more speedup than the genetic solution on a large cryptographic application (AES) by effectively exploiting its regular structure.
Linear minimum mean-square error (L-MMSE) equalization is among the most popular methods for data detection in massive multi-user multiple-input multiple-output (MU-MIMO) wireless systems. While L-MMSE equalization enables near-optimal spectral efficiency, accurate knowledge of the signal and noise powers is necessary. Furthermore, corresponding VLSI designs must solve linear systems of equations, which requires high arithmetic precision, exhibits stringent data dependencies, and results in high circuit complexity. This paper proposes the first VLSI design of the NOnParametric Equalizer (NOPE), which avoids knowledge of the transmit signal and noise powers, provably delivers the performance of L-MMSE equalization for massive MU-MIMO systems, and is resilient to numerous system and hardware impairments due to its parameter-free nature. Moreover, NOPE avoids computation of a matrix inverse and only requires hardware-friendly matrix-vector multiplications. To showcase the practical advantages of NOPE, we propose a parallel VLSI architecture and provide synthesis results in 28nm CMOS. We demonstrate that NOPE performs on par with existing data detectors for massive MU-MIMO that require accurate knowledge of the signal and noise powers.
Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in peoples lives. Empowered by edge computing, AI workloads are migrating from centralized cloud architectures to distributed edge systems, introducing a new paradigm called edge AI. While edge AI has the promise of bringing significant increases in autonomy and intelligence into everyday lives through common edge devices, it also raises new challenges, especially for the development of its algorithms and the deployment of its services, which call for novel design methodologies catered to these unique challenges. In this paper, we provide a comprehensive survey of the latest enabling design methodologies that span the entire edge AI development stack. We suggest that the key methodologies for effective edge AI development are single-layer specialization and cross-layer co-design. We discuss representative methodologies in each category in detail, including on-device training methods, specialized software design, dedicated hardware design, benchmarking and design automation, software/hardware co-design, software/compiler co-design, and compiler/hardware co-design. Moreover, we attempt to reveal hidden cross-layer design opportunities that can further boost the solution quality of future edge AI and provide insights into future directions and emerging areas that require increased research focus.
With the scaling of technology and higher requirements on performance and functionality, power dissipation is becoming one of the major design considerations in the development of network processors. In this paper, we use an assertion-based methodology for system-level power/performance analysis to study two dynamic voltage scaling (DVS) techniques, traffic-based DVS and execution-based DVS, in a network processor model. Using the automatically generated distribution analyzers, we analyze the power and performance distributions and study their trade-offs for the two DVS policies with different parameter settings such as threshold values and window sizes. We discuss the optimal configurations of the two DVS policies under different design requirements. By a set of experiments, we show that the assertion-based trace analysis methodology is an efficient tool that can help a designer easily compare and study optimal architectural configurations in a large design space.