بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Use of checkpoint-restart for complex HEP software on traditional architectures and Intel MIC

290 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Peter Elmer

تاريخ النشر 2013

مجال البحث فيزياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Kapil Arya - Gene Cooperman - Andrea Dotti

الفيزياء الحسابية النظم الموزعة والتوازية والحوسبة العنقودية التحليل العددي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Process checkpoint-restart is a technology with great potential for use in HEP workflows. Use cases include debugging, reducing the startup time of applications both in offline batch jobs and the High Level Trigger, permitting job preemption in environments where spare CPU cycles are being used opportunistically and efficient scheduling of a mix of multicore and single-threaded jobs. We report on tests of checkpoint-restart technology using CMS software, Geant4-MT (multi-threaded Geant4), and the DMTCP (Distributed Multithreaded Checkpointing) package. We analyze both single- and multi-threaded applications and test on both standard Intel x86 architectures and on Intel MIC. The tests with multi-threaded applications on Intel MIC are used to consider scalability and performance. These are considered an indicator of what the future may hold for many-core computing.

قيم البحث

516 - Jiajun Cao , Gregory Kerr , Kapil Arya 2013

InfiniBand is widely used for low-latency, high-throughput cluster computing. Saving the state of the InfiniBand network as part of distributed checkpointing has been a long-standing challenge for researchers. Because of a lack of a solution, typical MPI implementations have included custom checkpoint-restart services that tear down the network, checkpoint each node as if the node were a standalone computer, and then re-connect the network again. We present the first example of transparent, system-initiated checkpoint-restart that directly supports InfiniBand. The new approach is independent of any particular Linux kernel, thus simplifying the current practice of using a kernel-based module, such as BLCR. This direct approach results in checkpoints that are found to be faster than with the use of a checkpoint-restart service. The generality of this approach is shown not only by checkpointing an MPI computation, but also a native UPC computation (Berkeley Unified Parallel C), which does not use MPI. Scalability is shown by checkpointing 2,048 MPI processes across 128 nodes (with 16 cores per node). In addition, a cost-effective debugging approach is also enabled, in which a checkpoint image from an InfiniBand-based production cluster is copied to a local Ethernet-based cluster, where it can be restarted and an interactive debugger can be attached to it. This work is based on a plugin that extends the DMTCP (Distributed MultiThreaded CheckPointing) checkpoint-restart package.

أنظمة التشغيل النظم الموزعة والتوازية والحوسبة العنقودية

HEP Community White Paper on Software trigger and event reconstruction

81 - Johannes Albrecht 2018

Realizing the physics programs of the planned and upgraded high-energy physics (HEP) experiments over the next 10 years will require the HEP community to address a number of challenges in the area of software and computing. For this reason, the HEP s oftware community has engaged in a planning process over the past two years, with the objective of identifying and prioritizing the research and development required to enable the next generation of HEP detectors to fulfill their full physics potential. The aim is to produce a Community White Paper which will describe the community strategy and a roadmap for software and computing research and development in HEP for the 2020s. The topics of event reconstruction and software triggers were considered by a joint working group and are summarized together in this document.

الفيزياء الحسابية

HEP Community White Paper on Software trigger and event reconstruction: Executive Summary

76 - Johannes Albrecht 2018

الفيزياء الحسابية

A Roadmap for HEP Software and Computing R&D for the 2020s

144 - Johannes Albrecht , Antonio Augusto Alves Jr , Guilherme Amadio 2017

Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requ ires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.

الفيزياء الحسابية فيزياء الطاقة العالية - التجربة

A Generic Checkpoint-Restart Mechanism for Virtual Machines

382 - Rohan Garg , Komal Sodha , Gene Cooperman 2012

It is common today to deploy complex software inside a virtual machine (VM). Snapshots provide rapid deployment, migration between hosts, dependability (fault tolerance), and security (insulating a guest VM from the host). Yet, for each virtual machi ne, the code for snapshots is laboriously developed on a per-VM basis. This work demonstrates a generic checkpoint-restart mechanism for virtual machines. The mechanism is based on a plugin on top of an unmodified user-space checkpoint-restart package, DMTCP. Checkpoint-restart is demonstrated for three virtual machines: Lguest, user-space QEMU, and KVM/QEMU. The plugins for Lguest and KVM/QEMU require just 200 lines of code. The Lguest kernel driver API is augmented by 40 lines of code. DMTCP checkpoints user-space QEMU without any new code. KVM/QEMU, user-space QEMU, and DMTCP need no modification. The design benefits from other DMTCP features and plugins. Experiments demonstrate checkpoint and restart in 0.2 seconds using forked checkpointing, mmap-based fast-restart, and incremental Btrfs-based snapshots.

أنظمة التشغيل

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشام الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Use of checkpoint-restart for complex HEP software on traditional architectures and Intel MIC

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً