We outline design and lines of development of autonomous tools for the computing Grid management, monitoring and optimization. The management is proposed to be based on the notion of utility. Grid optimization is considered to be application-oriented. A generic Grid simulator is proposed as an optimization tool for Grid structure and functionality.
Selecting optimal resources for submitting jobs on a computational Grid or accessing data from a data grid is one of the most important tasks of any Grid middleware. Most modern Grid software today satisfies this responsibility and gives a best-effort performance to solve this problem. Almost all decisions regarding scheduling and data access are made by the software automatically, giving users little or no control over the entire process. To solve this problem, a more interactive set of services and middleware is desired that provides users more information about Grid weather, and gives them more control over the decision making process. This paper presents a set of services that have been developed to provide more interactive resource management capabilities within the Grid Analysis Environment (GAE) being developed collaboratively by Caltech, NUST and several other institutes. These include a steering service, a job monitoring service and an estimator service that have been designed and written using a common Grid-enabled Web Services framework named Clarens. The paper also presents a performance analysis of the developed services to show that they have indeed resulted in a more interactive and powerful system for user-centric Grid-enabled physics analysis.
We describe some of the key aspects of the SAMGrid system, used by the D0 and CDF experiments at Fermilab. Having sustained success of the data handling part of SAMGrid, we have developed new services for job and information services. Our job management is rooted in CondorG and uses enhancements that are general applicability for HEP grids. Our information system is based on a uniform framework for configuration management based on XML data representation and processing.
WorldGRID is an intercontinental testbed spanning Europe and the US integrating architecturally different Grid implementations based on the Globus toolkit. The WorldGRID testbed has been successfully demonstrated during the WorldGRID demos at SuperComputing 2002 (Baltimore) and IST2002 (Copenhagen) where real HEP application jobs were transparently submitted from US and Europe using native mechanisms and run where resources were available, independently of their location. To monitor the behavior and performance of such testbed and spot problems as soon as they arise, DataTAG has developed the EDT-Monitor tool based on the Nagios package that allows for Virtual Organization centric views of the Grid through dynamic geographical maps. The tool has been used to spot several problems during the WorldGRID operations, such as malfunctioning Resource Brokers or Information Servers, sites not correctly configured, job dispatching problems, etc. In this paper we give an overview of the package, its features and scalability solutions and we report on the experience acquired and the benefit that a GRID operation center would gain from such a tool.
We describe R-GMA (Relational Grid Monitoring Architecture) which has been developed within the European DataGrid Project as a Grid Information and Monitoring System. Is is based on the GMA from GGF, which is a simple Consumer-Producer model. The special strength of this implementation comes from the power of the relational model. We offer a global view of the information as if each Virtual Organisation had one large relational database. We provide a number of different Producer types with different characteristics; for example some support streaming of information. We also provide combined Consumer/Producers, which are able to combine information and republish it. At the heart of the system is the mediator, which for any query is able to find and connect to the best Producers for the job. We have developed components to allow a measure of inter-working between MDS and R-GMA. We have used it both for information about the grid (primarily to find out about what services are available at any one time) and for application monitoring. R-GMA has been deployed in various testbeds; we describe some preliminary results and experiences of this deployment.
Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to large-scale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.