The CAVES Project - Exploring Virtual Data Concepts for Data Analysis


Abstract in English

The Collaborative Analysis Versioning Environment System (CAVES) project concentrates on the interactions between users performing data and/or computing intensive analyses on large data sets, as encountered in many contemporary scientific disciplines. In modern science increasingly larger groups of researchers collaborate on a given topic over extended periods of time. The logging and sharing of knowledge about how analyses are performed or how results are obtained is important throughout the lifetime of a project. Here is where virtual data concepts play a major role. The ability to seamlessly log, exchange and reproduce results and the methods, algorithms and computer programs used in obtaining them enhances in a qualitative way the level of collaboration in a group or between groups in larger organizations. The CAVES project takes a pragmatic approach in assessing the needs of a community of scientists by building series of prototypes with increasing sophistication. In extending the functionality of existing data analysis packages with virtual data capabilities these prototypes provide an easy and habitual entry point for researchers to explore virtual data concepts in real life applications and to provide valuable feedback for refining the system design. The architecture is modular based on Web, Grid and other services which can be plugged in as desired. As a proof of principle we build a first system by extending the very popular data analysis framework ROOT, widely used in high energy physics and other fields, making it virtual data enabled.

Download