ترغب بنشر مسار تعليمي؟ اضغط هنا

A Workload Analysis of NSFs Innovative HPC Resources Using XDMoD

356   0   0.0 ( 0 )
 نشر من قبل Matthew D. Jones
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Workload characterization is an integral part of performance analysis of high performance computing (HPC) systems. An understanding of workload properties sheds light on resource utilization and can be used to inform performance optimization both at the software and system configuration levels. It can provide information on how computational science usage modalities are changing that could potentially aid holistic capacity planning for the wider HPC ecosystem. Here, we report on the results of a detailed workload analysis of the portfolio of supercomputers comprising the NSF Innovative HPC program in order to characterize its past and current workload and look for trends to understand the nature of how the broad portfolio of computational science research is being supported and how it is changing over time. The workload analysis also sought to illustrate a wide variety of usage patterns and performance requirements for jobs running on these systems. File system performance, memory utilization and the types of parallelism employed by users (MPI, threads, etc) were also studied for all systems for which job level performance data was available.



قيم البحث

اقرأ أيضاً

Traditionally, on-demand, rigid, and malleable applications have been scheduled and executed on separate systems. The ever-growing workload demands and rapidly developing HPC infrastructure trigger the interest of converging these applications on a s ingle HPC system. Although allocating the hybrid workloads within one system could potentially improve system efficiency, it is difficult to balance the tradeoff between the responsiveness of on-demand requests, the incentive for malleable jobs, and the performance of rigid applications. In this study, we present several scheduling mechanisms to address the issues involved in co-scheduling on-demand, rigid, and malleable jobs on a single HPC system. We extensively evaluate and compare their performance under various configurations and workloads. Our experimental results show that our proposed mechanisms are capable of serving on-demand workloads with minimal delay, offering incentives for declaring malleability, and improving system performance.
Blue Waters is a Petascale-level supercomputer whose mission is to enable the national scientific and research community to solve grand challenge problems that are orders of magnitude more complex than can be carried out on other high performance com puting systems. Given the important and unique role that Blue Waters plays in the U.S. research portfolio, it is important to have a detailed understanding of its workload in order to guide performance optimization both at the software and system configuration level as well as inform architectural balance tradeoffs. Furthermore, understanding the computing requirements of the Blue Waters workload (memory access, IO, communication, etc.), which is comprised of some of the most computationally demanding scientific problems, will help drive changes in future computing architectures, especially at the leading edge. With this objective in mind, the project team carried out a detailed workload analysis of Blue Waters.
Improving datacenter operations is vital for the digital society. We posit that doing so requires our community to shift, from operational aspects taken in isolation to holistic analysis of datacenter resources, energy, and workloads. In turn, this s hift will require new analysis methods, and open-access, FAIR datasets with fine temporal and spatial granularity. We leverage in this work one of the (rare) public datasets providing fine-grained information on datacenter operations. Using it, we show strong evidence that fine-grained information reveals new operational aspects. We then propose a method for holistic analysis of datacenter operations, providing statistical characterization of node, energy, and workload aspects. We demonstrate the benefits of our holistic analysis method by applying it to the operations of a datacenter infrastructure with over 300 nodes. Our analysis reveals both generic and ML-specific aspects, and further details how the operational behavior of the datacenter changed during the 2020 COVID-19 pandemic. We make over 30 main observations, providing holistic insight into the long-term operation of a large-scale, public scientific infrastructure. We suggest such observations can help immediately with performance engineering tasks such as predicting future datacenter load, and also long-term with the design of datacenter infrastructure.
Software container solutions have revolutionized application development approaches by enabling lightweight platform abstractions within the so-called containers. Several solutions are being actively developed in attempts to bring the benefits of con tainers to high-performance computing systems with their stringent security demands on the one hand and fundamental resource sharing requirements on the other. In this paper, we discuss the benefits and short-comings of such solutions when deployed on real HPC systems and applied to production scientific applications.We highlight use cases that are either enabled by or significantly benefit from such solutions. We discuss the efforts by HPC system administrators and support staff to support users of these type of workloads on HPC systems not initially designed with these workloads in mind focusing on NCSAs Blue Waters system.
The Cloud infrastructure offers to end users a broad set of heterogenous computational resources using the pay-as-you-go model. These virtualized resources can be provisioned using different pricing models like the unreliable model where resources ar e provided at a fraction of the cost but with no guarantee for an uninterrupted processing. However, the enormous gamut of opportunities comes with a great caveat as resource management and scheduling decisions are increasingly complicated. Moreover, the presented uncertainty in optimally selecting resources has also a negatively impact on the quality of solutions delivered by scheduling algorithms. In this paper, we present a dynamic scheduling algorithm (i.e., the Uncertainty-Driven Scheduling - UDS algorithm) for the management of scientific workflows in Cloud. Our model minimizes both the makespan and the monetary cost by dynamically selecting reliable or unreliable virtualized resources. For covering the uncertainty in decision making, we adopt a Fuzzy Logic Controller (FLC) to derive the pricing model of the resources that will host every task. We evaluate the performance of the proposed algorithm using real workflow applications being tested under the assumption of different probabilities regarding the revocation of unreliable resources. Numerical results depict the performance of the proposed approach and a comparative assessment reveals the position of the paper in the relevant literature.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا