ﻻ يوجد ملخص باللغة العربية
Virtual memory (VM) is critical to the usability and programmability of hardware accelerators. Unfortunately, implementing accelerator VM efficiently is challenging because the area and power constraints make it difficult to employ the large multi-level TLBs used in general-purpose CPUs. Recent research proposals advocate a number of restrictions on virtual-to-physical address mappings in order to reduce the TLB size or increase its reach. However, such restrictions are unattractive because they forgo many of the original benefits of traditional VM, such as demand paging and copy-on-write. We propose SPARTA, a divide and conquer approach to address translation. SPARTA splits the address translation into accelerator-side and memory-side parts. The accelerator-side translation hardware consists of a tiny TLB covering only the accelerators cache hierarchy (if any), while the translation for main memory accesses is performed by shared memory-side TLBs. Performing the translation for memory accesses on the memory side allows SPARTA to overlap data fetch with translation, and avoids the replication of TLB entries for data shared among accelerators. To further improve the performance and efficiency of the memory-side translation, SPARTA logically partitions the memory space, delegating translation to small and efficient per-partition translation hardware. Our evaluation on index-traversal accelerators shows that SPARTA virtually eliminates translation overhead, reducing it by over 30x on average (up to 47x) and improving performance by 57%. At the same time, SPARTA requires minimal accelerator-side translation hardware, reduces the total number of TLB entries in the system, gracefully scales with memory size, and preserves all key VM functionalities.
Compressed sensing (CS) theory assures us that we can accurately reconstruct magnetic resonance images using fewer k-space measurements than the Nyquist sampling rate requires. In traditional CS-MRI inversion methods, the fact that the energy within
We consider the learning of algorithmic tasks by mere observation of input-output pairs. Rather than studying this as a black-box discrete regression problem with no assumption whatsoever on the input-output mapping, we concentrate on tasks that are
Advantages in several fields of research and industry are expected with the rise of quantum computers. However, the computational cost to load classical data in quantum computers can impose restrictions on possible quantum speedups. Known algorithms
This paper proposes a hierarchical approximate-factor approach to analyzing high-dimensional, large-scale heterogeneous time series data using distributed computing. The new method employs a multiple-fold dimension reduction procedure using Principal
In data containing heterogeneous subpopulations, classification performance benefits from incorporating the knowledge of cluster structure in the classifier. Previous methods for such combined clustering and classification are either 1) classifier-sp