Do you want to publish a course? Click here

Open Data Portal Germany (OPAL) Projektergebnisse

95   0   0.0 ( 0 )
 Added by Adrian Wilke
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

In the Open Data Portal Germany (OPAL) project, a pipeline of the following data refinement steps has been developed: requirements analysis, data acquisition, analysis, conversion, integration and selection. 800,000 datasets in DCAT format have been produced.



rate research

Read More

192 - Bin Chen 2010
The Chem2Bio2RDF portal is a Linked Open Data (LOD) portal for systems chemical biology aiming for facilitating drug discovery. It converts around 25 different datasets on genes, compounds, drugs, pathways, side effects, diseases, and MEDLINE/PubMed documents into RDF triples and links them to other LOD bubbles, such as Bio2RDF, LODD and DBPedia. The portal is based on D2R server and provides a SPARQL endpoint, but adds on few unique features like RDF faceted browser, user-friendly SPARQL query generator, MEDLINE/PubMed cross validation service, and Cytoscape visualization plugin. Three use cases demonstrate the functionality and usability of this portal.
Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP. LOD encodes globalscale knowledge potentially available to any human as well as artificial intelligence that may want to benefit from it as background knowledge for supporting their tasks. LOD has emerged as the backbone of applications in diverse fields such as Natural Language Processing, Information Retrieval, Computer Vision, Speech Recognition, and many more. Nevertheless, regardless of the specific tasks that LOD-based tools aim to address, the reuse of such knowledge may be challenging for diverse reasons, e.g. semantic heterogeneity, provenance, and data quality. As aptly stated by Heath et al. Linked Data might be outdated, imprecise, or simply wrong: there arouses a necessity to investigate the problem of linked data validity. This work reports a collaborative effort performed by nine teams of students, guided by an equal number of senior researchers, attending the International Semantic Web Research School (ISWS 2018) towards addressing such investigation from different perspectives coupled with different approaches to tackle the issue.
362 - A.Amorim , J.Lima , C.Oliveira 2003
Conditions Data in high energy physics experiments is frequently seen as every data needed for reconstruction besides the event data itself. This includes all sorts of slowly evolving data like detector alignment, calibration and robustness, and data from detector control system. Also, every Conditions Data Object is associated with a time interval of validity and a version. Besides that, quite often is useful to tag collections of Conditions Data Objects altogether. These issues have already been investigated and a data model has been proposed and used for different implementations based in commercial DBMSs, both at CERN and for the BaBar experiment. The special case of the ATLAS complex trigger that requires online access to calibration and alignment data poses new challenges that have to be met using a flexible and customizable solution more in the line of Open Source components. Motivated by the ATLAS challenges we have developed an alternative implementation, based in an Open Source RDBMS. Several issues were investigated land will be described in this paper: -The best way to map the conditions data model into the relational database concept considering what are foreseen as the most frequent queries. -The clustering model best suited to address the scalability problem. -Extensive tests were performed and will be described. The very promising results from these tests are attracting the attention from the HEP community and driving further developments.
Here, we compare radiative accelerations (g_rad) derived from the new Opacity Project (OP) data with those computed from OPAL and some previous data from OP. For the case where we have full data from OPAL, the differences in the Rosseland mean opacities between OPAL and the new OP data are within 12% and less than 30% between new OP and previous OP data (OP1 at CDS). The radiative accelerations g_rad differ at up to the 17% level when compared to OPAL and up to the 38% level when compared to OP1. The comparison with OP1 on a larger (rho-T) space gives a difference of up to 40% for g_rad(C). And it increases for heavier elements. The differences increase for heavier elements reaching 60% for Si and 65% for S and Fe. We also constructed four representative stellar models in order to compare the new OP accelerations with prior published results that used OPAL data. The Rosseland means overall agree better than 10% for all of our cases. For the accelerations, the comparisons with published values yield larger differences in general. The published OPAL accelerations for carbon are even larger relative to OP compare to what our direct comparisons would indicate. Potential reasons for this puzzling behavior are discussed. In light of the significant differences in the inferred acceleration rates, theoretical errors should be taken into account when comparing models with observations. The implications for stellar evolution are briefly discussed. The sensitivity of g_rad to the atomic physics may provide a useful test of different opacity sources.
102 - Jie Song , Yeye He 2021
Complex data pipelines are increasingly common in diverse applications such as BI reporting and ML modeling. These pipelines often recur regularly (e.g., daily or weekly), as BI reports need to be refreshed, and ML models need to be retrained. However, it is widely reported that in complex production pipelines, upstream data feeds can change in unexpected ways, causing downstream applications to break silently that are expensive to resolve. Data validation has thus become an important topic, as evidenced by notable recent efforts from Google and Amazon, where the objective is to catch data quality issues early as they arise in the pipelines. Our experience on production data suggests, however, that on string-valued data, these existing approaches yield high false-positive rates and frequently require human intervention. In this work, we develop a corpus-driven approach to auto-validate emph{machine-generated data} by inferring suitable data-validation patterns that accurately describe the underlying data domain, which minimizes false positives while maximizing data quality issues caught. Evaluations using production data from real data lakes suggest that Auto-Validate is substantially more effective than existing methods. Part of this technology ships as an Auto-Tag feature in Microsoft Azure Purview.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا