No Arabic abstract
Application users have now been experiencing for about a year with the standardized resource brokering services provided by the workload management package of the EU DataGrid project (WP1). Understanding, shaping and pushing the limits of the system has provided valuable feedback on both its design and implementation. A digest of the lessons, and better practices, that were learned, and that were applied towards the second major release of the software, is given.
In the first phase of the European DataGrid project, the workload management package (WP1) implemented a working prototype, providing users with an environment allowing to define and submit jobs to the Grid, and able to find and use the ``best resources for these jobs. Application users have now been experiencing for about a year now with this first release of the workload management system. The experiences acquired, the feedback received by the user and the need to plug new components implementing new functionalities, triggered an update of the existing architecture. A description of this revised and complemented workload management system is given.
WorldGrid is an intercontinental testbed spanning Europe and the US integrating architecturally different Grid implementations based on the Globus toolkit. It has been developed in the context of the DataTAG and iVDGL projects, and successfully demonstrated during the WorldGrid demos at IST2002 (Copenhagen) and SC2002 (Baltimore). Two HEP experiments, ATLAS and CMS, successful exploited the WorldGrid testbed for executing jobs simulating the response of their detectors to physics eve nts produced by real collisions expected at the LHC accelerator starting from 2007. This data intensive activity has been run since many years on local dedicated computing farms consisting of hundreds of nodes and Terabytes of disk and tape storage. Within the WorldGrid testbed, for the first time HEP simulation jobs were submitted and run indifferently on US and European resources, despite of their underlying different Grid implementations, and produced data which could be retrieved and further analysed on the submitting machine, or simply stored on the remote resources and registered on a Replica Catalogue which made them available to the Grid for further processing. In this contribution we describe the job submission from Europe for both ATLAS and CMS applications, performed through the GENIUS portal operating on top of an EDG User Interface submitting to an EDG Resource Broker, pointing out the chosen interoperability solutions which made US and European resources equivalent from the applications point of view, the data management in the WorldGrid environment, and the CMS specific production tools which were interfaced to the GENIUS portal.
Cloud service providers are distributing data centers geographically to minimize energy costs through intelligent workload distribution. With increasing data volumes in emerging cloud workloads, it is critical to factor in the network costs for transferring workloads across data centers. For geo-distributed data centers, many researchers have been exploring strategies for energy cost minimization and intelligent inter-data-center workload distribution separately. However, prior work does not comprehensively and simultaneously consider data center energy costs, data transfer costs, and data center queueing delay. In this paper, we propose a novel game theory-based workload management framework that takes a holistic approach to the cloud operating cost minimization problem by making intelligent scheduling decisions aware of data transfer costs and the data center queueing delay. Our framework performs intelligent workload management that considers heterogeneity in data center compute capability, cooling power, interference effects from task co-location in servers, time-of-use electricity pricing, renewable energy, net metering, peak demand pricing distribution, and network pricing. Our simulations show that the proposed game-theoretic technique can minimize the cloud operating cost more effectively than existing approaches.
Traditionally, on-demand, rigid, and malleable applications have been scheduled and executed on separate systems. The ever-growing workload demands and rapidly developing HPC infrastructure trigger the interest of converging these applications on a single HPC system. Although allocating the hybrid workloads within one system could potentially improve system efficiency, it is difficult to balance the tradeoff between the responsiveness of on-demand requests, the incentive for malleable jobs, and the performance of rigid applications. In this study, we present several scheduling mechanisms to address the issues involved in co-scheduling on-demand, rigid, and malleable jobs on a single HPC system. We extensively evaluate and compare their performance under various configurations and workloads. Our experimental results show that our proposed mechanisms are capable of serving on-demand workloads with minimal delay, offering incentives for declaring malleability, and improving system performance.
Selecting optimal resources for submitting jobs on a computational Grid or accessing data from a data grid is one of the most important tasks of any Grid middleware. Most modern Grid software today satisfies this responsibility and gives a best-effort performance to solve this problem. Almost all decisions regarding scheduling and data access are made by the software automatically, giving users little or no control over the entire process. To solve this problem, a more interactive set of services and middleware is desired that provides users more information about Grid weather, and gives them more control over the decision making process. This paper presents a set of services that have been developed to provide more interactive resource management capabilities within the Grid Analysis Environment (GAE) being developed collaboratively by Caltech, NUST and several other institutes. These include a steering service, a job monitoring service and an estimator service that have been designed and written using a common Grid-enabled Web Services framework named Clarens. The paper also presents a performance analysis of the developed services to show that they have indeed resulted in a more interactive and powerful system for user-centric Grid-enabled physics analysis.