بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

ObjTables: structured spreadsheets that promote data quality, reuse, and integration

184 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jonathan Karr

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية علم الأحياء

والبحث باللغة English

تأليف Jonathan R. Karr - Wolfram Liebermeister - Arthur P. Goldberg

قواعد البيانات الأساليب الكمية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A central challenge in science is to understand how systems behaviors emerge from complex networks. This often requires aggregating, reusing, and integrating heterogeneous information. Supplementary spreadsheets to articles are a key data source. Spreadsheets are popular because they are easy to read and write. However, spreadsheets are often difficult to reanalyze because they capture data ad hoc without schemas that define the objects, relationships, and attributes that they represent. To help researchers reuse and compose spreadsheets, we developed ObjTables, a toolkit that makes spreadsheets human- and machine-readable by combining spreadsheets with schemas and an object-relational mapping system. ObjTables includes a format for schemas; markup for indicating the class and attribute represented by each spreadsheet and column; numerous data types for scientific information; and high-level software for using schemas to read, write, validate, compare, merge, revision, and analyze spreadsheets. By making spreadsheets easier to reuse, ObjTables could enable unprecedented secondary meta-analyses. By making it easy to build new formats and associated software for new types of data, ObjTables can also accelerate emerging scientific fields.

قيم البحث

138 - Angelos-Christos Anadiotis , Oana Balalau , Catarina Conceicao 2020

Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journ alists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to define and deploy custom extract-transform-load workflows, especially for dynamically varying sets of data sources. We describe a complete approach for integrating dynamic sets of heterogeneous datasets along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.

قواعد البيانات الذكاء الاصطناعي

Graph integration of structured, semistructured and unstructured data for data journalism

115 - Oana Balalau 2020

Nowadays, journalism is facilitated by the existence of large amounts of digital data sources, including many Open Data ones. Such data sources are extremely heterogeneous, ranging from highly struc-tured (relational databases), semi-structured (JSON , XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to de ne and deploy custom extract-transform-load work ows. These are di cult to set up not only for arbitrary heterogeneous inputs , but also given that users may want to add (or remove) datasets to (from) the corpus. We describe a complete approach for integrating dynamic sets of heterogeneous data sources along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.

قواعد البيانات الذكاء الاصطناعي أجهزة الكمبيوتر والمجتمع

Using a Big Data Database to Identify Pathogens in Protein Data Space

415 - Ashley Mae Conard , Stephanie Dodson , Jeremy Kepner 2015

Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors / false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate goal of rapid pathogen identification in patient samples. Our approach uses the abilities of a big data databases to hold large sparse associative array representations of genetic data to extract statistical patterns about the data that can be used in a variety of ways to improve identification algorithms.

قواعد البيانات الأساليب الكمية

XLSearch: A Search Engine for Spreadsheets

352 - Michael Kohlhase , Corneliu Prodescu , Christian Liguda 2014

Spreadsheets are end-user programs and domain models that are heavily employed in administration, financial forecasting, education, and science because of their intuitive, flexible, and direct approach to computation. As a result, institutions are sw amped by millions of spreadsheets that are becoming increasingly difficult to manage, access, and control. This note presents the XLSearch system, a novel search engine for spreadsheets. It indexes spreadsheet formulae and efficiently answers formula queries via unification (a complex query language that allows metavariables in both the query as well as the index). But a web-based search engine is only one application of the underlying technology: Spreadsheet formula export to web standards like MathML combined with formula indexing can be used to find similar spreadsheets or common formula errors.

قواعد البيانات

Data provenance, curation and quality in metrology

175 - James Cheney , Adriane Chapman , Joy Davidson 2021

Data metrology -- the assessment of the quality of data -- particularly in scientific and industrial settings, has emerged as an important requirement for the UK National Physical Laboratory (NPL) and other national metrology institutes. Data provena nce and data curation are key components for emerging understanding of data metrology. However, to date provenance research has had limited visibility to or uptake in metrology. In this work, we summarize a scoping study carried out with NPL staff and industrial participants to understand their current and future needs for provenance, curation and data quality. We then survey provenance technology and standards that are relevant to metrology. We analyse the gaps between requirements and the current state of the art.

قواعد البيانات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الأندلس للعلوم الطبية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

ObjTables: structured spreadsheets that promote data quality, reuse, and integration

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً