ﻻ يوجد ملخص باللغة العربية
Recent architectural approaches that address speculative side-channel attacks aim to prevent software from exposing the microarchitectural state changes of transient execution. The Delay-on-Miss technique is one such approach, which simply delays loads that miss in the L1 cache until they become non-speculative, resulting in no transient changes in the memory hierarchy. However, this costs performance, prompting the use of value prediction (VP) to regain some of the delay. However, the problem cannot be solved by simply introducing a new kind of speculation (value prediction). Value-predicted loads have to be validated, which cannot be commenced until the load becomes non-speculative. Thus, value-predicted loads occupy the same amount of precious core resources (e.g., reorder buffer entries) as Delay-on-Miss. The end result is that VP only yields marginal benefits over Delay-on-Miss. In this paper, our insight is that we can achieve the same goal as VP (increasing performance by providing the value of loads that miss) without incurring its negative side-effect (delaying the release of precious resources), if we can safely, non-speculatively, recompute a value in isolation (without being seen from the outside), so that we do not expose any information by transferring such a value via the memory hierarchy. Value Recomputation, which trades computation for data transfer was previously proposed in an entirely different context: to reduce energy-expensive data transfers in the memory hierarchy. In this paper, we demonstrate the potential of value recomputation in relation to the Delay-on-Miss approach of hiding speculation, discuss the trade-offs, and show that we can achieve the same level of security, reaching 93% of the unsecured baseline performance (5% higher than Delay-on-miss), and exceeding (by 3%) what even an oracular (100% accuracy and coverage) value predictor could do.
Traditional Von Neumann computing is falling apart in the era of exploding data volumes as the overhead of data transfer becomes forbidding. Instead, it is more energy-efficient to fuse compute capability with memory where the data reside. This is pa
It has always been difficult to balance the accuracy and performance of ISSs. RTL simulators or systems such as gem5 are used to execute programs in a cycle-accurate manner but are often prohibitively slow. In contrast, functional simulators such as
Systematic checkpointing of the machine state makes restart of execution from a safe state possible upon detection of an error. The time and energy overhead of checkpointing, however, grows with the frequency of checkpointing. Amortizing this overhea
We describe the technical effort used to process a voluminous high value human neuroimaging dataset on the Open Science Grid with opportunistic use of idle HPC resources to boost computing capacity more than 5-fold. With minimal software development
Nesterovs well-known scheme for accelerating gradient descent in convex optimization problems is adapted to accelerating stationary iterative solvers for linear systems. Compared with classical Krylov subspace acceleration methods, the proposed schem