ترغب بنشر مسار تعليمي؟ اضغط هنا

Automated Resolution of Noisy Bibliographic References

296   0   0.0 ( 0 )
 نشر من قبل Markus Demleitner
 تاريخ النشر 2004
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We describe a system used by the NASA Astrophysics Data System to identify bibliographic references obtained from scanned article pages by OCR methods with records in a bibliographic database. We analyze the process generating the noisy references and conclude that the three-step procedure of correcting the OCR results, parsing the corrected string and matching it against the database provides unsatisfactory results. Instead, we propose a method that allows a controlled merging of correction, parsing and matching, inspired by dependency grammars. We also report on the effectiveness of various heuristics that we have employed to improve recall.

قيم البحث

اقرأ أيضاً

86 - Chao Min , Jiawei Xu , Tao Han 2021
Scientometrics studies have extended from direct citations to high-order citations, as simple citation count is found to tell only part of the story regarding scientific impact. This extension is deemed to be beneficial in scenarios like research eva luation, science history modeling, and information retrieval. In contrast to citations of citations (forward citation generations), references of references (backward citation generations) as another side of high-order citations, is relatively less explored. We adopt a series of metrics for measuring the unfolding of backward citations of a focal paper, tracing back to its knowledge ancestors generation by generation. Two sub-fields in Physics are subject to such analysis on a large-scale citation network. Preliminary results show that (1) most papers in our dataset can be traced to their knowledge ancestry; (2) the size distribution of backward citation generations presents a decreasing-and-then-increasing shape; and (3) citations more than one generation away are still relevant to the focal paper, from either a forward or backward perspective; yet, backward citation generations are higher in topic relevance to the paper of interest. Furthermore, the backward citation generations shed lights for literature recommendation, science evaluation, and sociology of science studies.
54 - Christoph Schommer 2008
Social Communities in bibliographic databases exist since many years, researchers share common research interests, and work and publish together. A social community may vary in type and size, being fully connected between participating members or eve n more expressed by a consortium of small and individual members who play individual roles in it. In this work, we focus on social communities inside the bibliographic database DBLP and characterize communities through a simple typifying description model. Generally, we understand a publication as a transaction between the associated authors. The idea therefore is to concern with directed associative relationships among them, to decompose each pattern to its fundamental structure, and to describe the communities by expressive attributes. Finally, we argue that the decomposition supports the management of discovered structures towards the use of adaptive-incremental mind-maps.
Multidisciplinary cooperation is now common in research since social issues inevitably involve multiple disciplines. In research articles, reference information, especially citation content, is an important representation of communication among diffe rent disciplines. Analyzing the distribution characteristics of references from different disciplines in research articles is basic to detecting the sources of referred information and identifying contributions of different disciplines. This work takes articles in PLoS as the data and characterizes the references from different disciplines based on Citation Content Analysis (CCA). First, we download 210,334 full-text articles from PLoS and collect the information of the in-text citations. Then, we identify the discipline of each reference in these academic articles. To characterize the distribution of these references, we analyze three characteristics, namely, the number of citations, the average cited intensity and the average citation length. Finally, we conclude that the distributions of references from different disciplines are significantly different. Although most references come from Natural Science, Humanities and Social Sciences play important roles in the Introduction and Background sections of the articles. Basic disciplines, such as Mathematics, mainly provide research methods in the articles in PLoS. Citations mentioned in the Results and Discussion sections of articles are mainly in-discipline citations, such as citations from Nursing and Medicine in PLoS.
Todays scientific research is an expensive enterprise funded largely by taxpayers and corporate groups monies. It is a critical part in the competition between nations, and all nations want to discover fields of research that promise to create future industries, and dominate these by building up scientific and technological expertise early. However, our understanding of the value chain going from science to technology is still in a relatively infant stage, and the conversion of scientific leadership into market dominance remains very much an alchemy rather than a science. In this paper, we analyze bibliometric records of scientific journal publications and patents related to graphene, at the aggregate level as well as on the temporal and spatial dimensions. We find the present leaders of graphene science and technology emerged rather late in the race, after the initial scientific leaders lost their footings. More importantly, notwithstanding the amount of funding already committed, we find evidences that suggest the Golden Eras of graphene science and technology were in 2010 and 2012 respectively, in spite of the continued growth of journal and patent publications in this area.
Our current knowledge of scholarly plagiarism is largely based on the similarity between full text research articles. In this paper, we propose an innovative and novel conceptualization of scholarly plagiarism in the form of reuse of explicit citatio n sentences in scientific research articles. Note that while full-text plagiarism is an indicator of a gross-level behavior, copying of citation sentences is a more nuanced micro-scale phenomenon observed even for well-known researchers. The current work poses several interesting questions and attempts to answer them by empirically investigating a large bibliographic text dataset from computer science containing millions of lines of citation sentences. In particular, we report evidences of massive copying behavior. We also present several striking real examples throughout the paper to showcase widespread adoption of this undesirable practice. In contrast to the popular perception, we find that copying tendency increases as an author matures. The copying behavior is reported to exist in all fields of computer science; however, the theoretical fields indicate more copying than the applied fields.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا