ﻻ يوجد ملخص باللغة العربية
With the advent of increasingly complex hardware in real-time embedded systems (processors with performance enhancing features such as pipelines, cache hierarchy, multiple cores), many processors now have a set-associative L2 cache. Thus, there is a need for considering cache hierarchies when validating the temporal behavior of real-time systems, in particular when estimating tasks worst-case execution times (WCETs). To the best of our knowledge, there is only one approach for WCET estimation for systems with cache hierarchies [Mueller, 1997], which turns out to be unsafe for set-associative caches. In this paper, we highlight the conditions under which the approach described in [Mueller, 1997] is unsafe. A safe static instruction cache analysis method is then presented. Contrary to [Mueller, 1997] our method supports set-associative and fully associative caches. The proposed method is experimented on medium-size and large programs. We show that the method is most of the time tight. We further show that in all cases WCET estimations are much tighter when considering the cache hierarchy than when considering only the L1 cache. An evaluation of the analysis time is conducted, demonstrating that analysing the cache hierarchy has a reasonable computation time.
Memory-bound algorithms show complex performance and energy consumption behavior on multicore processors. We choose the lattice-Boltzmann method (LBM) on an Intel Sandy Bridge cluster as a prototype scenario to investigate if and how single-chip perf
Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained suggestions at the kernel level, if any. In this paper, we describe GPA, a
An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architectures. Such predictions are an in
In this paper, we study how certain conditions can affect the transformations on the states of the memory of a strict load-store Maurer ISA, when half of the data memory serves as the part of the operating unit.
We have developed a highly-tuned software library that accelerates the calculation of quadrupole terms in the Barnes-Hut tree code by use of a SIMD instruction set on the x86 architecture, Advanced Vector eXtensions 2 (AVX2). Our code is implemented