ﻻ يوجد ملخص باللغة العربية
We report here on the results of two studies using two and four monthly web crawls respectively from the Common Crawl (CC) initiative between 2014 and 2017, whose initial goal was to provide empirical evidence for the changing patterns of use of so-called persistent identifiers. This paper focusses on the tooling needed for dealing with CC data, and the problems we found with it. The first study is based on over $10^{12}$ URIs from over $5 * 10^9$ pages crawled in April 2014 and April 2017, the second study adds a further $3 * 10^9$ pages from the April 2015 and April 2016 crawls. We conclude with suggestions on specific actions needed to enable studies based on CC to give reliable longitudinal information.
Event collections are frequently built by crawling the live web on the basis of seed URIs nominated by human experts. Focused web crawling is a technique where the crawler is guided by reference content pertaining to the event. Given the dynamic natu
We quantify the extent to which references to papers in scholarly literature use persistent HTTP URIs that leverage the Digital Object Identifier infrastructure. We find a significant number of references that do not, speculate why authors would use
We give the first dimension-efficient algorithms for learning Rectified Linear Units (ReLUs), which are functions of the form $mathbf{x} mapsto max(0, mathbf{w} cdot mathbf{x})$ with $mathbf{w} in mathbb{S}^{n-1}$. Our algorithm works in the challeng
The centroid energy of the Fe K$alpha$ line has been used to identify the progenitors of supernova remnants (SNRs). These investigations generally considered the energy of the centroid derived from the spectrum of the entire remnant. Here we use {it
In this paper, we reviewed the notes on using Web map image provided by Web map service, from the viewpoint of copyright act. The copyright act aims to contribute to creation of culture by protecting the rights of authors and others, and promoting fa