ترغب بنشر مسار تعليمي؟ اضغط هنا

The Memento Tracer Framework: Balancing Quality and Scalability for Web Archiving

66   0   0.0 ( 0 )
 نشر من قبل Martin Klein
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Web archiving frameworks are commonly assessed by the quality of their archival records and by their ability to operate at scale. The ubiquity of dynamic web content poses a significant challenge for crawler-based solutions such as the Internet Archive that are optimized for scale. Human driven services such as the Webrecorder tool provide high-quality archival captures but are not optimized to operate at scale. We introduce the Memento Tracer framework that aims to balance archival quality and scalability. We outline its concept and architecture and evaluate its archival quality and operation at scale. Our findings indicate quality is on par or better compared against established archiving frameworks and operation at scale comes with a manageable overhead.



قيم البحث

اقرأ أيضاً

Maintenance of multiple, distributed up-to-date copies of collections of changing Web resources is important in many application contexts and is often achieved using ad hoc or proprietary synchronization solutions. ResourceSync is a resource synchron ization framework that integrates with the Web architecture and leverages XML sitemaps. We define a model for the ResourceSync framework as a basis for understanding its properties. We then describe experiments in which simulations of a variety of synchronization scenarios illustrate the effects of model configuration on consistency, latency, and data transfer efficiency. These results provide insight into which congurations are appropriate for various application scenarios.
The data underlying scientific papers should be accessible to researchers both now and in the future, but how best can we ensure that these data are available? Here we examine the effectiveness of four approaches to data archiving: no stated archivin g policy, recommending (but not requiring) archiving, and t
The appeal of metric evaluation of research impact has attracted considerable interest in recent times. Although the public at large and administrative bodies are much interested in the idea, scientists and other researchers are much more cautious, i nsisting that metrics are but an auxiliary instrument to the qualitative peer-based judgement. The goal of this article is to propose availing of such a well positioned construct as domain taxonomy as a tool for directly assessing the scope and quality of research. We first show how taxonomies can be used to analyse the scope and perspectives of a set of research projects or papers. Then we proceed to define a research team or researchers rank by those nodes in the hierarchy that have been created or significantly transformed by the results of the researcher. An experimental test of the approach in the data analysis domain is described. Although the concept of taxonomy seems rather simplistic to describe all the richness of a research domain, its changes and use can be made transparent and subject to open discussions.
181 - Boliang He 2014
AstroCloud is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences){url{http://astrocloud.c hina-vo.org}}citep{O8-5_Cui_adassxxiv}. To archive the astronomical data in China, we present the implementation of the astronomical data archiving system (ADAS). Data archiving and quality control are the infrastructure for the AstroCloud. Throughout the data of the entire life cycle, data archiving system standardized data, transferring data, logging observational data, archiving ambient data, And storing these data and metadata in database. Quality control covers the whole process and all aspects of data archiving.
Web archiving services play an increasingly important role in todays information ecosystem, by ensuring the continuing availability of information, or by deliberately caching content that might get deleted or removed. Among these, the Wayback Machine has been proactively archiving, since 200
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا