Subscribe to the gold package and get unlimited access to Shamra Academy

Modeling Shared Cache Performance of OpenMP Programs using Reuse Distance

61 0 0.0 ( 0 )

Download Cite

Added by Gopinath Chennupati

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Atanu Barai - Gopinath Chennupati - Nandakishore Santhi andn Abdel-Hameed A. Badawy

Performance Software Engineering

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Performance modeling of parallel applications on multicore computers remains a challenge in computational co-design due to the complex design of multicore processors including private and shared memory hierarchies. We present a Scalable Analytical Shared Memory Model to predict the performance of parallel applications that runs on a multicore computer and shares the same level of cache in the hierarchy. This model uses a computationally efficient, probabilistic method to predict the reuse distance profiles, where reuse distance is a hardware architecture-independent measure of the patterns of virtual memory accesses. It relies on a stochastic, static basic block-level analysis of reuse profiles measured from the memory traces of applications ran sequentially on small instances rather than using a multi-threaded trace. The results indicate that the hit-rate predictions on the shared cache are accurate.

rate research

An Effective Early Multi-core System Shared Cache Design Method Based on Reuse-distance Analysis

98 - Hsin-Yu Ho , Ren-Song Tsay 2021

In this paper, we proposed an effective and efficient multi-core shared-cache design optimization approach based on reuse-distance analysis of the data traces of target applications. Since data traces are independent of system hardware architectures, a designer can easily compute the best cache design at the early system design phase using our approach. We devise a very efficient and yet accurate method to derive the aggregated reuse-distance histograms of concurrent applications for accurate cache performance analysis and optimization. Essentially, the actual shared-cache contention results of concurrent applications are embedded in the aggregated reuse-distance histograms and therefore the approach is very effective. The experimental results show that the average error rate of shared-cache miss-count estimations of our approach is less than 2.4%. Using a simple scanning search method, one can easily determine the true optimal cache configurations at the early system design phase.

Performance

An architecture-based dependability modeling framework using AADL

360 - Ana-Elena Rugina 2007

For efficiency reasons, the software system designers will is to use an integrated set of methods and tools to describe specifications and designs, and also to perform analyses such as dependability, schedulability and performance. AADL (Architecture Analysis and Design Language) has proved to be efficient for software architecture modeling. In addition, AADL was designed to accommodate several types of analyses. This paper presents an iterative dependency-driven approach for dependability modeling using AADL. It is illustrated on a small example. This approach is part of a complete framework that allows the generation of dependability analysis and evaluation models from AADL models to support the analysis of software and system architectures, in critical application domains.

Performance Software Engineering

Performance Modeling and Evaluation for Information-Driven Networks

563 - Kui Wu , Yuming Jiang , Guoqiang Hu 2008

Information-driven networks include a large category of networking systems, where network nodes are aware of information delivered and thus can not only forward data packets but may also perform information processing. In many situations, the quality of service (QoS) in information-driven networks is provisioned with the redundancy in information. Traditional performance models generally adopt evaluation measures suitable for packet-oriented service guarantee, such as packet delay, throughput, and packet loss rate. These performance measures, however, do not align well with the actual need of information-driven networks. New performance measures and models for information-driven networks, despite their importance, have been mainly blank, largely because information processing is clearly application dependent and cannot be easily captured within a generic framework. To fill the vacancy, we present a new performance evaluation framework particularly tailored for information-driven networks, based on the recent development of stochastic network calculus. We analyze the QoS with respect to information delivery and study the scheduling problem with the new performance metrics. Our analytical framework can be used to calculate the network capacity in information delivery and in the meantime to help transmission scheduling for a large body of systems where QoS is stochastically guaranteed with the redundancy in information.

Performance Networking and Internet Architecture

Performance Modeling and Prediction for Dense Linear Algebra

60 - Elmar Peise 2017

This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their performance through runtime estimates for the underlying compute kernels. For a variety of operations, these predictions allow to quickly select the fastest algorithm configurations from available alternatives. We consider two scenarios that cover a wide range of computations: To predict the performance of blocked algorithms, we design algorithm-independent performance models for kernel operations that are generated automatically once per platform. For various matrix operations, instantaneous predictions based on such models both accurately identify the fastest algorithm, and select a near-optimal block size. For performance predictions of BLAS-based tensor contractions, we propose cache-aware micro-benchmarks that take advantage of the highly regular structure inherent to contraction algorithms. At merely a fraction of a contractions runtime, predictions based on such micro-benchmarks identify the fastest combination of tensor traversal and compute kernel.

Performance

Analytical Performance Modeling of NoCs under Priority Arbitration and Bursty Traffic

143 - Sumit K. Mandal , Raid Ayoub , Michael Kishinevsky 2020

Networks-on-Chip (NoCs) used in commercial many-core processors typically incorporate priority arbitration. Moreover, they experience bursty traffic due to application workloads. However, most state-of-the-art NoC analytical performance analysis techniques assume fair arbitration and simple traffic models. To address these limitations, we propose an analytical modeling technique for priority-aware NoCs under bursty traffic. Experimental evaluations with synthetic and bursty traffic show that the proposed approach has less than 10% modeling error with respect to cycle-accurate NoC simulator.

Performance

comments

Fetching comments

Sham Private University

Additional details More universities

Modeling Shared Cache Performance of OpenMP Programs using Reuse Distance

Ask ChatGPT about the research

No Arabic abstract

Read More