ترغب بنشر مسار تعليمي؟ اضغط هنا

Project Pipeline: Preservation, Persistence, and Performance

218   0   0.0 ( 0 )
 نشر من قبل Christopher Rauch
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Preservation pipelines demonstrate extended value when digitized content is also computation ready. Expanding this to historical controlled vocabularies published in analog format requires additional steps if they are to be fully leveraged for research. This paper reports on work addressing this challenge. We report on a pipeline and project progress addressing three key goals: 1) transforming the 1910 Library of Congress Subject Headings (LCSH) to the Simple Knowledge Organization System (SKOS) linked data standard, 2) implementing persistent identifiers (PIDs) and launching our prototype ARK resolver, and 3) importing the 1910 LCSH into the Helping Interdisciplinary Vocabulary Engineering (HIVE) System to support automatic metadata generation and scholarly analysis of the historical record. The discussion considers the implications of our work in the broader context of preservation, and the conclusion summarizes our work and identifies next steps.

قيم البحث

اقرأ أيضاً

Scholarly resources, just like any other resources on the web, are subject to reference rot as they frequently disappear or significantly change over time. Digital Object Identifiers (DOIs) are commonplace to persistently identify scholarly resources and have become the de facto standard for citing them. We investigate the notion of persistence of DOIs by analyzing their resolution on the web. We derive confidence in the persistence of these identifiers in part from the assumption that dereferencing a DOI will consistently return the same response, regardless of which HTTP request method we use or from which network environment we send the requests. Our experiments show, however, that persistence, according to our interpretation, is not warranted. We find that scholarly content providers respond differently to varying request methods and network environments and even change their response to requests against the same DOI. In this paper we present the results of our quantitative analysis that is aimed at informing the scholarly communication community about this disconcerting lack of consistency.
174 - Xiangyang Ju 2021
The Exa.TrkX project has applied geometric learning concepts such as metric learning and graph neural networks to HEP particle tracking. The Exa.TrkX tracking pipeline clusters detector measurements to form track candidates and filters them. The pipe line, originally developed using the TrackML dataset (a simulation of an LHC-like tracking detector), has been demonstrated on various detectors, including the DUNE LArTPC and the CMS High-Granularity Calorimeter. This paper documents new developments needed to study the physics and computing performance of the Exa.TrkX pipeline on the full TrackML dataset, a first step towards validating the pipeline using ATLAS and CMS data. The pipeline achieves tracking efficiency and purity similar to production tracking algorithms. Crucially for future HEP applications, the pipeline benefits significantly from GPU acceleration, and its computational requirements scale close to linearly with the number of particles in the event.
160 - Ye Sun , Giacomo Livan , Athen Ma 2021
Interdisciplinary research is fundamental when it comes to tackling complex problems in our highly interlinked world, and is on the rise globally. Yet, it is unclear why--in an increasingly competitive academic environment--one should pursue an inter disciplinary career given its recent negative press. Several studies have indeed shown that interdisciplinary research often achieves lower impact compared to more specialized work, and is less likely to attract funding. We seek to reconcile such evidence by analyzing a dataset of 44,419 research grants awarded between 2006 and 2018 from the seven national research councils in the UK. We compared the research performance of researchers with an interdisciplinary funding track record with those who have a specialized profile. We found that the former dominates the network of academic collaborations, both in terms of centrality and knowledge brokerage; but such a competitive advantage does not immediately translate into impact. Indeed, by means of a matched pair experimental design, we found that researchers who transcend between disciplines on average achieve lower impacts in their publications than the subject specialists in the short run, but eventually outperform them in funding performance, both in terms of volume and value. Our results suggest that launching an interdisciplinary career may require more time and persistence to overcome extra challenges, but can pave the way for a more successful endeavour.
74 - S. K. Ghosh 2020
Performance of the Level-2 pipeline, which translates the UVIT data created by the ISROs ground segment processing systems (Level-1) into astronomer ready scientific data products, is described. This pipeline has evolved significantly from experience s during the in orbit mission. With time, the detector modules of UVIT developed certain defects which led to occasional corruption of imaging and timing data. This article will describe the improvements and mitigation plans incorporated in the pipeline and report its efficacy and quantify the performance.
Collaborative work on unstructured or semi-structured documents, such as in literature corpora or source code, often involves agreed upon templates containing metadata. These templates are not consistent across users and over time. Rule-based parsing of these templates is expensive to maintain and tends to fail as new documents are added. Statistical techniques based on frequent occurrences have the potential to identify automatically a large fraction of the templates, thus reducing the burden on the programmers. We investigate the case of the Project Gutenberg corpus, where most documents are in ASCII format with preambles and epilogues that are often copied and pasted or manually typed. We show that a statistical approach can solve most cases though some documents require knowledge of English. We also survey various technical solutions that make our approach applicable to large data sets.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا