ﻻ يوجد ملخص باللغة العربية
The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy efficiency of an application running on the CGRA. This paper presents an automated approach for generating specialized PE architectures for an application or an application domain. Frequent subgraphs mined from a set of applications are merged to form a PE architecture specialized to that application domain. For the image processing and machine learning domains, we generate specialized PEs that are up to 10.5x more energy efficient and consume 9.1x less area than a baseline PE.
With the scaling of technology and higher requirements on performance and functionality, power dissipation is becoming one of the major design considerations in the development of network processors. In this paper, we use an assertion-based methodolo
The commercialization of autonomous machines is a thriving sector, and likely to be the next major computing demand driver, after PC, cloud computing, and mobile computing. Nevertheless, a suitable computer architecture for autonomous machines is mis
Multi-objective optimization is a crucial matter in computer systems design space exploration because real-world applications often rely on a trade-off between several objectives. Derivatives are usually not available or impractical to compute and th
Coarse-grained reconfigurable architectures (CGRAs) are programmable logic devices with large coarse-grained ALU-like logic blocks, and multi-bit datapath-style routing. CGRAs often have relatively restricted data routing networks, so they attract CA
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates t