ترغب بنشر مسار تعليمي؟ اضغط هنا

Return to basics: Clustering of scientific literature using structural information

112   0   0.0 ( 0 )
 نشر من قبل Jinhyuk Yun
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Scholars frequently employ relatedness measures to estimate the similarity between two different items (e.g., documents, authors, and institutes). Such relatedness measures are commonly based on overlapping references ($textit{i.e.}$, bibliographic coupling) or citations ($textit{i.e.}$, co-citation) and can then be used with cluster analysis to find boundaries between research fields. Unfortunately, calculating a relatedness measure is challenging, especially for a large number of items, because the computational complexity is greater than linear. We propose an alternative method for identifying the research front that uses direct citation inspired by relatedness measures. Our novel approach simply replicates a node into two distinct nodes: a citing node and cited node. We then apply typical clustering methods to the modified network. Clusters of citing nodes should emulate those from the bibliographic coupling relatedness network, while clusters of cited nodes should act like those from the co-citation relatedness network. In validation tests, our proposed method demonstrated high levels of similarity with conventional relatedness-based methods. We also found that the clustering results of proposed method outperformed those of conventional relatedness-based measures regarding similarity with natural language processing--based classification.



قيم البحث

اقرأ أيضاً

140 - Qi Li , Luoyi Fu , Xinbing Wang 2021
The rapid development of modern science and technology has spawned rich scientific topics to research and endless production of literature in them. Just like X-ray imaging in medicine, can we intuitively identify the development limit and internal ev olution pattern of scientific topic from the relationship of massive knowledge? To answer this question, we collect 71431 seminal articles of topics that cover 16 disciplines and their citation data, and extracts the idea tree of each topic to restore the structure of the development of 71431 topic networks from scratch. We define the Knowledge Entropy (KE) metric, and the contribution of high knowledge entropy nodes to increase the depth of the idea tree is regarded as the basis for topic development. By observing X-ray images of topics, We find two interesting phenomena: (1) Even though the scale of topics may increase unlimitedly, there is an insurmountable cap of topic development: the depth of the idea tree does not exceed 6 jumps, which coincides with the classical Six Degrees of Separation! (2) It is difficult for a single article to contribute more than 3 jumps to the depth of its topic, to this end, the continuing increase in the depth of the idea tree needs to be motivated by the influence relay of multiple high knowledge entropy nodes. Through substantial statistical fits, we derive a unified quantitative relationship between the change in topic depth ${Delta D}^t(v)$ and the change in knowledge entropy over time ${KE}^tleft(vright)$ of the article $v$ driving the increase in depth in the topic: ${Delta D}^t(v) approx log frac{KE^{t}(v)}{left(t-t_{0}right)^{1.8803}}$ , which can effectively portray evolution patterns of topics and predict their development potential. The various phenomena found by scientific x-ray may provide a new paradigm for explaining and understanding the evolution of science and technology.
186 - Simona Doboli , Fanshu Zhao , 2014
The goal of our research is to understand how ideas propagate, combine and are created in large social networks. In this work, we look at a sample of relevant scientific publications in the area of high-frequency analog circuit design and their citat ion distribution. A novel aspect of our work is the way in which we categorize citations based on the reason and place of it in a publication. We created seven citation categories from general domain references, references to specific methods used in the same domain problem, references to an analysis method, references for experimental comparison and so on. This added information allows us to define two new measures to characterize the creativity (novelty and usefulness) of a publication based on its pattern of citations clustered by reason, place and citing scientific group. We analyzed 30 publications in relevant journals since 2000 and their about 300 citations, all in the area of high-frequency analog circuit design. We observed that the number of citations a publication receives from different scientific groups matches a Levy type distribution: with a large number of groups citing a publication relatively few times, and a very small number of groups citing a publication a large number of times. We looked at the motifs a publication is cited differently by different scientific groups.
177 - M. G. Pia 2009
The Geant4 reference paper published in Nuclear Instruments and Methods A in 2003 has become the most cited publication in the whole Nuclear Science and Technology category of Thomson-Reuters Journal Citation Reports. It is currently the second most cited article among the publications authored by two major research institutes, CERN and INFN. An overview of Geant4 presence (and absence) in scholarly literature is presented; the patterns of Geant4 citations are quantitatively examined and discussed.
Several studies exist which use scientific literature for comparing scientific activities (e.g., productivity, and collaboration). In this study, using co-authorship data over the last 40 years, we present the evolutionary dynamics of multi level (i. e., individual, institutional and national) collaboration networks for exploring the emergence of collaborations in the research field of steel structures. The collaboration network of scientists in the field has been analyzed using author affiliations extracted from Scopus between 1970 and 2009. We have studied collaboration distribution networks at the micro-, meso- and macro-levels for the 40 years. We compared and analyzed a number of properties of these networks (i.e., density, centrality measures, the giant component and clustering coefficient) for presenting a longitudinal analysis and statistical validation of the evolutionary dynamics of steel structures collaboration networks. At all levels, the scientific collaborations network structures were central considering the closeness centralization while betweenness and degree centralization were much lower. In general networks density, connectedness, centralization and clustering coefficient were highest in marco-level and decreasing as the network size grow to the lowest in micro-level. We also find that the average distance between countries about two and institutes five and for authors eight meaning that only about eight steps are necessary to get from one randomly chosen author to another.
119 - Yuming Wang , Yanbo Long , Lai Tu 2019
Research grants have played an important role in seeding and promoting fundamental research projects worldwide. There is a growing demand for developing and delivering scientific influence analysis as a service on research grant repositories. Such an alysis can provide insight on how research grants help foster new research collaborations, encourage cross-organizational collaborations, influence new research trends, and identify technical leadership. This paper presents the design and development of a grants-based scientific influence analysis service, coined as GImpact. It takes a graph-theoretic approach to design and develop large scale scientific influence analysis over a large research-grant repository with three original contributions. First, we mine the grant database to identify and extract important features for grants influence analysis and represent such features using graph theoretic models. For example, we extract an institution graph and multiple associated aspect-based collaboration graphs, including a discipline graph and a keyword graph. Second, we introduce self-influence and co-influence algorithms to compute two types of collaboration relationship scores based on the number of grants and the types of grants for institutions. We compute the self-influence scores to reflect the grant based research collaborations among institutions and compute multiple co-influence scores to model the various types of cross-institution collaboration relationships in terms of disciplines and subject areas. Third, we compute the overall scientific influence score for every pair of institutions by introducing a weighted sum of the self-influence score and the multiple co-influence scores and conduct an influence-based clustering analysis. We evaluate GImpact using a real grant database, consisting of 2512 institutions and their grants received over a period of 14 years...
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا