CaosDB - Research Data Management for Complex, Changing, and Automated Research Workflows

145 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniel Hornung

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Timm Fitschen - Alexander Schlemmer - Daniel Hornung

قواعد البيانات الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Here we present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments during the complete research data lifecycle. An RDMS for this domain faces particular challenges: Research data arise in huge amounts, from a wide variety of sources, and traverse a highly branched path of further processing. To be accepted by its users, an RDMS must be built around workflows of the scientists and practices and thus support changes in workflow and data structure. Nevertheless it should encourage and support the development and observation of standards and furthermore facilitate the automation of data acquisition and processing with specialized software. The storage data model of an RDMS must reflect these complexities with appropriate semantics and ontologies while offering simple methods for finding, retrieving, and understanding relevant data. We show how CaosDB responds to these challenges and give an overview of the CaosDB Server, its data model and its easy-to-learn CaosDB Query Language. We briefly discuss the status of the implementation, how we currently use CaosDB, and how we plan to use and extend it.

قيم البحث

147 - Serge Abiteboul , Marcelo Arenas , Pablo Barcelo 2017

In April 2016, a community of researchers working in the area of Principles of Data Management (PDM) joined in a workshop at the Dagstuhl Castle in Germany. The workshop was organized jointly by the Executive Committee of the ACM Symposium on Princip les of Database Systems (PODS) and the Council of the International Conference on Database Theory (ICDT). The mission of this workshop was to identify and explore some of the most important research directions that have high relevance to society and to Computer Science today, and where the PDM community has the potential to make significant contributions. This report describes the family of research directions that the workshop focused on from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term.

قواعد البيانات

Energy Efficiency: The New Holy Grail of Data Management Systems Research

148 - Stavros Harizopoulos 2009

Energy costs are quickly rising in large-scale data centers and are soon projected to overtake the cost of hardware. As a result, data center operators have recently started turning into using more energy-friendly hardware. Despite the growing body o f research in power management techniques, there has been little work to date on energy efficiency from a data management software perspective. In this paper, we argue that hardware-only approaches are only part of the solution, and that data management software will be key in optimizing for energy efficiency. We discuss the problems arising from growing energy use in data centers and the trends that point to an increasing set of opportunities for software-level optimizations. Using two simple experiments, we illustrate the potential of such optimizations, and, motivated by these examples, we discuss general approaches for reducing energy waste. Lastly, we point out existing places within database systems that are promising for energy-efficiency optimizations and urge the data management systems community to shift focus from performance-oriented research to energy-efficient computing.

قواعد البيانات الأداء

Data management to support reproducible research

245 - B. A. Wandell , A. Rokem , L. M. Perry 2015

We describe the current state and future plans for a set of tools for scientific data management (SDM) designed to support scientific transparency and reproducible research. SDM has been in active use at our MRI Center for more than two years. We des igned the system to be used from the beginning of a research project, which contrasts with conventional end-state databases that accept data as a project concludes. A number of benefits accrue from using scientific data management tools early and throughout the project, including data integrity as well as reuse of the data and of computational methods.

الأساليب الكمية

Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development

73 - Rafael Ferreira da Silva , Henri Casanova , Kyle Chard 2021

Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale HPC platforms. Workflows will play a crucial role in the data-oriented and post-Moores computing landscape as they democratize the application of cutting-edge research techniques, computationally intensive methods, and use of new computing platforms. As workflows continue to be adopted by scientific projects and user communities, they are becoming more complex. Workflows are increasingly composed of tasks that perform computations such as short machine learning inference, multi-node simulations, long-running machine learning model training, amongst others, and thus increasingly rely on heterogeneous architectures that include CPUs but also GPUs and accelerators. The workflow management system (WMS) technology landscape is currently segmented and presents significant barriers to entry due to the hundreds of seemingly comparable, yet incompatible, systems that exist. Another fundamental problem is that there are conflicting theoretical bases and abstractions for a WMS. Systems that use the same underlying abstractions can likely be translated between, which is not the case for systems that use different abstractions. More information: https://workflowsri.org/summits/technical

النظم الموزعة والتوازية والحوسبة العنقودية

Structuring research methods and data with the Research Object model: genomics workflows as a case study

186 - Kristina M. Hettne , Harish Dharuri , Jun Zhao 2013

One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of su ch computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows.

الجينوم المكتبات الرقمية