No Arabic abstract
In this presentation the experiences of the LHC experiments using grid computing were presented with a focus on experience with distributed analysis. After many years of development, preparation, exercises, and validation the LHC (Large Hadron Collider) experiments are in operations. The computing infrastructure has been heavily utilized in the first 6 months of data collection. The general experience of exploiting the grid infrastructure for organized processing and preparation is described, as well as the successes employing the infrastructure for distributed analysis. At the end the expected evolution and future plans are outlined.
The D0 experiment at Fermilabs Tevatron will record several petabytes of data over the next five years in pursuing the goals of understanding nature and searching for the origin of mass. Computing resources required to analyze these data far exceed capabilities of any one institution. Moreover, the widely scattered geographical distribution of D0 collaborators poses further serious difficulties for optimal use of human and computing resources. These difficulties will exacerbate in future high energy physics experiments, like the LHC. The computing grid has long been recognized as a solution to these problems. This technology is being made a more immediate reality to end users in D0 by developing a grid in the D0 Southern Analysis Region (D0SAR), D0SAR-Grid, using all available resources within it and a home-grown local task manager, McFarm. We will present the architecture in which the D0SAR-Grid is implemented, the use of technology and the functionality of the grid, and the experience from operating the grid in simulation, reprocessing and data analyses for a currently running HEP experiment.
A key feature of collaboration in science and software development is to have a {em log} of what and how is being done - for private use and reuse and for sharing selected parts with collaborators, which most often today are distributed geographically on an ever larger scale. Even better if this log is {em automatic}, created on the fly while a scientist or software developer is working in a habitual way, without the need for extra efforts. The {tt CAVES} and {tt CODESH} projects address this problem in a novel way, building on the concepts of {em virtual state} and {em virtual transition} to provide an automatic persistent logbook for sessions of data analysis or software development in a collaborating group. A repository of sessions can be configured dynamically to record and make available the knowledge accumulated in the course of a scientific or software endeavor. Access can be controlled to define logbooks of private sessions and sessions shared within or between collaborating groups.
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web, or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, ROOT offers packages for complex data modeling and fitting, as well as multivariate classification based on machine learning techniques. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.
The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data analysis software, for making best use of HL-LHCs physics potential. This work shows what these areas could be, why the ROOT team believes investing in them is needed, which gains are expected, and where related work is ongoing. It can serve as an indication for future research proposals and cooperations.
VISPA is a novel development environment for high energy physics analyses, based on a combination of graphical and textual steering. The primary aim of VISPA is to support physicists in prototyping, performing, and verifying a data analysis of any complexity. We present example screenshots, and describe the underlying software concepts.