ترغب بنشر مسار تعليمي؟ اضغط هنا

We propose an approach to data memory prefetching which augments the standard prefetch buffer with selection criteria based on performance and usage pattern of a given instruction. This approach is built on top of a pattern matching based prefetcher, specifically one which can choose between a stream, a stride, or a stream followed by a stride. We track the most recently called instructions to make a decision on the quantity of data to prefetch next. The decision is based on the frequency with which these instructions are called and the hit/miss rate of the prefetcher. In our approach, we separate the amount of data to prefetch into three categories: a high degree, a standard degree and a low degree. We ran tests on different values for the high prefetch degree, standard prefetch degree and low prefetch degree to determine that the most optimal combination was 1, 4, 8 lines respectively. The 2 dimensional selection criteria improved the performance of the prefetcher by up to 9.5% over the first data prefetching championship winner. Unfortunately performance also fell by as much as 14%, but remained similar on average across all of the benchmarks we tested.
The Sphynx project was an exploratory study to discover what might be done to improve the heavy replication of in- structions in independent instruction caches for a massively parallel machine where a single program is executing across all of the cor es. While a machine with only many cores (fewer than 50) might not have any issues replicating the instructions for each core, as we approach the era where thousands of cores can be placed on one chip, the overhead of instruction replication may become unacceptably large. We believe that a large amount of sharing should be possible when the ma- chine is configured for all of the threads to issue from the same set of instructions. We propose a technique that allows sharing an instruction cache among a number of independent processor cores to allow for inter-thread sharing and reuse of instruction memory. While we do not have test cases to demonstrate the potential magnitude of performance gains that could be achieved, the potential for sharing reduces the die area required for instruction storage on chip.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا