ﻻ يوجد ملخص باللغة العربية
FPGAs have become emerging computing infrastructures for accelerating applications in datacenters. Meanwhile, high-level synthesis (HLS) tools have been proposed to ease the programming of FPGAs. Even with HLS, irregular data-intensive applications require explicit optimizations, among which multiple processing elements (PEs) with each owning a private BRAM-based buffer are usually adopted to process multiple data per cycle. Data routing, which dynamically dispatches multiple data to designated PEs, avoids data replication in buffers compared to statically assigning data to PEs, hence saving BRAM usage. However, the workload imbalance among PEs vastly diminishes performance when processing skew datasets. In this paper, we propose a skew-oblivious data routing architecture that allocates secondary PEs and schedules them to share the workload of the overloaded PEs at run-time. In addition, we integrate the proposed architecture into a framework called Ditto to minimize the development efforts for applications that require skew handling. We evaluate Ditto on five commonly used applications: histogram building, data partitioning, pagerank, heavy hitter detection and hyperloglog. The results demonstrate that the generated implementations are robust to skew datasets and outperform the stateof-the-art designs in both throughput and BRAM usage efficiency.
Energy harvesting is an attractive way to power future IoT devices since it can eliminate the need for battery or power cables. However, harvested energy is intrinsically unstable. While FPGAs have been widely adopted in various embedded systems, it
Data intensive applications often involve the analysis of large datasets that require large amounts of compute and storage resources. While dedicated compute and/or storage farms offer good task/data throughput, they suffer low resource utilization p
Multi-tenancy for latency-critical applications leads to re-source interference and unpredictable performance. Core reconfiguration opens up more opportunities for colocation,as it allows the hardware to adjust to the dynamic performance and power ne
Image bitmaps have been widely used in in-memory applications, which consume lots of storage space and energy. Compared with legacy DRAM, non-volatile memories (NVMs) are suitable for bitmap storage due to the salient features in capacity and power s
Network-wide traffic analytics are often needed for various network monitoring tasks. These measurements are often performed by collecting samples at network switches, which are then sent to the controller for aggregation. However, performing such an