Methods to Evaluate Lifecycle Models for Research Data Management

82 0 0.0 ( 0 )

Download Cite

Added by Tobias Weber

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Tobias Weber - Dieter Kranzlmuller

Digital Libraries

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Lifecycle models for research data are often abstract and simple. This comes at the danger of oversimplifying the complex concepts of research data management. The analysis of 90 different lifecycle models lead to two approaches to assess the quality of these models. While terminological issues make direct comparisons of models hard, an empirical evaluation seems possible.

rate research

Integrating Research Data Management into Geographical Information Systems

147 - Christian T. Jacobs , Alexandros Avdis , Simon L. Mouradian 2015

Ocean modelling requires the production of high-fidelity computational meshes upon which to solve the equations of motion. The production of such meshes by hand is often infeasible, considering the complexity of the bathymetry and coastlines. The use of Geographical Information Systems (GIS) is therefore a key component to discretising the region of interest and producing a mesh appropriate to resolve the dynamics. However, all data associated with the production of a mesh must be provided in order to contribute to the overall recomputability of the subsequent simulation. This work presents the integration of research data management in QMesh, a tool for generating meshes using GIS. The tool uses the PyRDM library to provide a quick and easy way for scientists to publish meshes, and all data required to regenerate them, to persistent online repositories. These repositories are assigned unique identifiers to enable proper citation of the meshes in journal articles.

Digital Libraries Computational Engineering

Data management to support reproducible research

423 - B. A. Wandell , A. Rokem , L. M. Perry 2015

We describe the current state and future plans for a set of tools for scientific data management (SDM) designed to support scientific transparency and reproducible research. SDM has been in active use at our MRI Center for more than two years. We designed the system to be used from the beginning of a research project, which contrasts with conventional end-state databases that accept data as a project concludes. A number of benefits accrue from using scientific data management tools early and throughout the project, including data integrity as well as reuse of the data and of computational methods.

Quantitative Methods

Mandated data archiving greatly improves access to research data

455 - Timothy H. Vines , Rose L. Andrew , Dan G. Bock 2013

The data underlying scientific papers should be accessible to researchers both now and in the future, but how best can we ensure that these data are available? Here we examine the effectiveness of four approaches to data archiving: no stated archiving policy, recommending (but not requiring) archiving, and t

Digital Libraries Physics and Society Quantitative Methods

Generating Synthetic Text Data to Evaluate Causal Inference Methods

110 - Zach Wood-Doughty , Ilya Shpitser , Mark Dredze 2021

Drawing causal conclusions from observational data requires making assumptions about the true data-generating process. Causal inference research typically considers low-dimensional data, such as categorical or numerical fields in structured medical records. High-dimensional and unstructured data such as natural language complicates the evaluation of causal inference methods; such evaluations rely on synthetic datasets with known causal effects. Models for natural language generation have been widely studied and perform well empirically. However, existing methods not immediately applicable to producing synthetic datasets for causal evaluations, as they do not allow for quantifying a causal effect on the text itself. In this work, we develop a framework for adapting existing generation models to produce synthetic text datasets with known causal effects. We use this framework to perform an empirical comparison of four recently-proposed methods for estimating causal effects from text data. We release our code and synthetic datasets.

Computation and Language

Multi-level computational methods for interdisciplinary research in the HathiTrust Digital Library

81 - Jaimie Murdock , Colin Allen , Katy Borner 2017

We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other research areas and ultimately support a system for semi-automatic identification of argument structures. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. We show how a combination of classification systems and mixed-membership models trained over large digital libraries can inform resource discovery in this domain. Through a novel approach of drill-down topic modeling---simultaneously reducing both the size of the corpus and the unit of analysis---we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of close reading needed to generate original interpretations that is the heart of scholarly work in the humanities. Zooming back out, we provide a way to place the books onto a map of science originally constructed from very different data and for different purposes. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted.

Digital Libraries Computation and Language Information Retrieval