No Arabic abstract
Patent data represent a significant source of information on innovation and the evolution of technology through networks of citations, co-invention and co-assignment of new patents. A major obstacle to extracting useful information from this data is the problem of name disambiguation: linking alternate spellings of individuals or institutions to a single identifier to uniquely determine the parties involved in the creation of a technology. In this paper, we describe a new algorithm that uses high-resolution geolocation to disambiguate both inventor and assignees on more than 3.6 million patents found in the European Patent Office (EPO), under the Patent Cooperation treaty (PCT), and in the US Patent and Trademark Office (USPTO). We show that our algorithm has both high precision and recall in comparison to a manual disambiguation of EPO assignee names in Boston and Paris, and show it performs well for a benchmark of USPTO inventor names that can be linked to a high-resolution address (but poorly for inventors that never provided a high quality address). The most significant benefit of this work is the high quality assignee disambiguation with worldwide coverage coupled with an inventor disambiguation that is competitive with other state of the art approaches. To our knowledge this is the broadest and most accurate simultaneous disambiguation and cross-linking of the inventor and assignee names for a significant fraction of patents in these three major patent collections.
To quantify the mechanism of a complex network growth we focus on the network of citations of scientific papers and use a combination of the theoretical and experimental tools to uncover microscopic details of this network growth. Namely, we develop a stochastic model of citation dynamics based on copying/redirection/triadic closure mechanism. In a complementary and coherent way, the model accounts both for statistics of references of scientific papers and for their citation dynamics. Originating in empirical measurements, the model is cast in such a way that it can be verified quantitatively in every aspect. Such verification is performed by measuring citation dynamics of Physics papers. The measurements revealed nonlinear citation dynamics, the nonlinearity being intricately related to network topology. The nonlinearity has far-reaching consequences including non-stationary citation distributions, diverging citation trajectory of similar papers, runaways or immortal papers with infinite citation lifetime etc. Thus, our most important finding is nonlinearity in complex network growth. In a more specific context, our results can be a basis for quantitative probabilistic prediction of citation dynamics of individual papers and of the journal impact factor.
We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individual papers and then merges similar clusters. This parameterized model is optimized using an h-index based recall measure, favoring the correct assignment of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.
We present a novel methodology of augmenting the scattering data measured by small angle neutron scattering via an emerging deep convolutional neural network (CNN) that is widely used in artificial intelligence (AI). Data collection time is reduced by increasing the size of binning of the detector pixels at the sacrifice of resolution. High-resolution scattering data is then reconstructed by using AI deep super-resolution learning method. This technique can not only improve the productivity of neutron scattering instruments by speeding up the experimental workflow but also enable capturing kinetic changes and transient phenomenon of materials that are currently inaccessible by existing neutron scattering techniques.
High-spatial-resolution (HSR) two-component, two-dimensional particle-image-velocimetry (2C-2D PIV) measurements of a zero-pressure-gradient (ZPG) turbulent boundary layer (TBL) and an adverse-pressure-gradient (APG)-TBL were taken in the LMFL High Reynolds number Boundary Layer Wind Tunnel. The ZPG-TBL has a momentum-thickness based Reynolds number $Re_{delta_2} = delta_2 U_e/ u = 7,750$ while the APG-TBL has a $Re_{delta_2} = 16,240$ and a Clausers pressure gradient parameter $beta = delta_1 P_x/tau_w = 2.27$ After analysing the single-exposed PIV image data using a multigrid/multipass digital PIV (Soria, 1996) with in-house software, proper orthogonal decomposition (POD) was performed on the data to separate flow-fields into large- and small-scale motions (LSMs and SSMs), with the LSMs further categorized into high- and low-momentum events. Profiles of the conditionally averaged Reynolds stresses show that the high-momentum events contribute more to the Reynolds stresses than the low-momentum between wall to the end of the log-layer and the opposite is the case in the wake region. The cross-over point of the profiles of the Reynolds stresses from the high- and low-momentum LSMs always has a higher value than the corresponding Reynolds stress from the original ensemble at the same wall-normal location. Furthermore, the cross-over point in the APG-TBL moves further from the wall than in the ZPG-TBL. By removing the velocity fields with LSMs, the estimate of the Reynolds streamwise stress and Reynolds shear stress from the remaining velocity fields is reduced by up to $42 %$ in the ZPG-TBL. The reduction effect is observed to be even larger (up to $50%$) in the APG-TBL. However, the removal of these LSMs has a minimal effect on the Reynolds wall-normal stress in both the ZPG- and APG-TBL.
Modeling and forecasting forward citations to a patent is a central task for the discovery of emerging technologies and for measuring the pulse of inventive progress. Conventional methods for forecasting these forward citations cast the problem as analysis of temporal point processes which rely on the conditional intensity of previously received citations. Recent approaches model the conditional intensity as a chain of recurrent neural networks to capture memory dependency in hopes of reducing the restrictions of the parametric form of the intensity function. For the problem of patent citations, we observe that forecasting a patents chain of citations benefits from not only the patents history itself but also from the historical citations of assignees and inventors associated with that patent. In this paper, we propose a sequence-to-sequence model which employs an attention-of-attention mechanism to capture the dependencies of these multiple time sequences. Furthermore, the proposed model is able to forecast both the timestamp and the category of a patents next citation. Extensive experiments on a large patent citation dataset collected from USPTO demonstrate that the proposed model outperforms state-of-the-art models at forward citation forecasting.