ترغب بنشر مسار تعليمي؟ اضغط هنا

ScienceWISE: Topic Modeling over Scientific Literature Networks

96   0   0.0 ( 0 )
 نشر من قبل Alessio Cardillo
 تاريخ النشر 2016
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We provide an up-to-date view on the knowledge management system ScienceWISE (SW) and address issues related to the automatic assignment of articles to research topics. So far, SW has been proven to be an effective platform for managing large volumes of technical articles by means of ontological concept-based browsing. However, as the publication of research articles accelerates, the expressivity and the richness of the SW ontology turns into a double-edged sword: a more fine-grained characterization of articles is possible, but at the cost of introducing more spurious relations among them. In this context, the challenge of continuously recommending relevant articles to users lies in tackling a network partitioning problem, where nodes represent articles and co-occurring concepts create edges between them. In this paper, we discuss the three research directions we have taken for solving this issue: i) the identification of generic concepts to reinforce inter-article similarities; ii) the adoption of a bipartite network representation to improve scalability; iii) the design of a clustering algorithm to identify concepts for cross-disciplinary articles and obtain fine-grained topics for all articles.



قيم البحث

اقرأ أيضاً

Inspired by the social and economic benefits of diversity, we analyze over 9 million papers and 6 million scientists to study the relationship between research impact and five classes of diversity: ethnicity, discipline, gender, affiliation, and acad emic age. Using randomized baseline models, we establish the presence of homophily in ethnicity, gender and affiliation. We then study the effect of diversity on scientific impact, as reflected in citations. Remarkably, of the classes considered, ethnic diversity had the strongest correlation with scientific impact. To further isolate the effects of ethnic diversity, we used randomized baseline models and again found a clear link between diversity and impact. To further support these findings, we use coarsened exact matching to compare the scientific impact of ethnically diverse papers and scientists with closely-matched control groups. Here, we find that ethnic diversity resulted in an impact gain of 10.63% for papers, and 47.67% for scientists.
Modern science is dominated by scientific productions from teams. A recent finding shows that teams with both large and small sizes are essential in research, prompting us to analyze the extent to which a countrys scientific work is carried out by bi g/small teams. Here, using over 26 million publications from Web of Science, we find that Chinas research output is more dominated by big teams than the rest of the world, which is particularly the case in fields of natural science. Despite the global trend that more papers are done by big teams, Chinas drop in small team output is much steeper. As teams in China shift from small to large size, the team diversity that is essential for innovative works does not increase as much as that in other countries. Using the national average as the baseline, we find that the National Natural Science Foundation of China (NSFC) supports fewer small team works than the National Science Foundation of U.S. (NSF) does, implying that big teams are more preferred by grant agencies in China. Our finding provides new insights into the concern of originality and innovation in China, which urges a need to balance small and big teams.
110 - M. Golosovsky , S. Solomon 2016
To quantify the mechanism of a complex network growth we focus on the network of citations of scientific papers and use a combination of the theoretical and experimental tools to uncover microscopic details of this network growth. Namely, we develop a stochastic model of citation dynamics based on copying/redirection/triadic closure mechanism. In a complementary and coherent way, the model accounts both for statistics of references of scientific papers and for their citation dynamics. Originating in empirical measurements, the model is cast in such a way that it can be verified quantitatively in every aspect. Such verification is performed by measuring citation dynamics of Physics papers. The measurements revealed nonlinear citation dynamics, the nonlinearity being intricately related to network topology. The nonlinearity has far-reaching consequences including non-stationary citation distributions, diverging citation trajectory of similar papers, runaways or immortal papers with infinite citation lifetime etc. Thus, our most important finding is nonlinearity in complex network growth. In a more specific context, our results can be a basis for quantitative probabilistic prediction of citation dynamics of individual papers and of the journal impact factor.
A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organiz e literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics. We discover eight major latent topics and 52 granular subtopics in the body of literature, related to vaccines, genetic structure and modeling of the disease and patient studies, as well as related diseases and virology. In order that our tool may help current researchers, an interactive website is created that organizes available literature using this hierarchical structure.
We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs avail able for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individual papers and then merges similar clusters. This parameterized model is optimized using an h-index based recall measure, favoring the correct assignment of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا