Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

WikiCSSH: Extracting and Evaluating Computer Science Subject Headings from Wikipedia

368 0 0.0 ( 0 )

Download Cite

Added by Kanyao Han

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Kanyao Han - Pingjing Yang - Shubhanshu Mishra

Social and Information Networks Digital Libraries

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Hierarchical domain-specific classification schemas (or subject heading vocabularies) are often used to identify, classify, and disambiguate concepts that occur in scholarly articles. In this work, we develop, apply, and evaluate a human-in-the-loop workflow that first extracts an initial category tree from crowd-sourced Wikipedia data, and then combines community detection, machine learning, and hand-crafted heuristics or rules to prune the initial tree. This work resulted in WikiCSSH; a large-scale, hierarchically organized vocabulary for the domain of computer science (CS). Our evaluation suggests that WikiCSSH outperforms alternative CS vocabularies in terms of vocabulary size as well as the performance of lexicon-based key-phrase extraction from scholarly data. WikiCSSH can further distinguish between coarse-grained versus fine-grained CS concepts. The outlined workflow can serve as a template for building hierarchically-organized subject heading vocabularies for other domains that are covered in Wikipedia.

rate research

Collaboration in computer science: a network science approach. Part II

348 - Massimo Franceschet 2011

We represent collaboration of authors in computer science papers in terms of both affiliation and collaboration networks and observe how these networks evolved over time since 1960. We investigate the temporal evolution of bibliometric properties, like size of the discipline, productivity of scholars, and collaboration level in papers, as well as of large-scale network properties, like reachability and average separation distance among scientists, distribution of the number of scholar collaborators, network clustering and network assortativity by number of collaborators.

Social and Information Networks Digital Libraries Physics and Society

New measures for evaluating creativity in scientific publications

513 - Simona Doboli , Fanshu Zhao , 2014

The goal of our research is to understand how ideas propagate, combine and are created in large social networks. In this work, we look at a sample of relevant scientific publications in the area of high-frequency analog circuit design and their citation distribution. A novel aspect of our work is the way in which we categorize citations based on the reason and place of it in a publication. We created seven citation categories from general domain references, references to specific methods used in the same domain problem, references to an analysis method, references for experimental comparison and so on. This added information allows us to define two new measures to characterize the creativity (novelty and usefulness) of a publication based on its pattern of citations clustered by reason, place and citing scientific group. We analyzed 30 publications in relevant journals since 2000 and their about 300 citations, all in the area of high-frequency analog circuit design. We observed that the number of citations a publication receives from different scientific groups matches a Levy type distribution: with a large number of groups citing a publication relatively few times, and a very small number of groups citing a publication a large number of times. We looked at the motifs a publication is cited differently by different scientific groups.

Social and Information Networks Digital Libraries Physics and Society

Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data

161 - Shuqi Xu , Manuel Sebastian Mariani , Linyuan Lu 2020

Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.

Social and Information Networks Digital Libraries Information Retrieval

Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering

526 - Fionn Murtagh 2008

The history of data analysis that is addressed here is underpinned by two themes, -- those of tabular data analysis, and the analysis of collected heterogeneous data. Exploratory data analysis is taken as the heuristic approach that begins with data and information and seeks underlying explanation for what is observed or measured. I also cover some of the evolving context of research and applications, including scholarly publishing, technology transfer and the economic relationship of the university to society.

Computers and Society Digital Libraries

Build up of a subject classification system from collective intelligence

97 - Jisung Yoon , Jinhyuk Yun , Woo-Sung Jung 2018

Systematized subject classification is essential for funding and assessing scientific projects. Conventionally, classification schemes are founded on the empirical knowledge of the group of experts; thus, the experts perspectives have influenced the current systems of scientific classification. Those systems archived the current state-of-art in practice, yet the global effect of the accelerating scientific change over time has made the updating of the classifications system on a timely basis vertually impossible. To overcome the aforementioned limitations, we propose an unbiased classification scheme that takes advantage of collective knowledge; Wikipedia, an Internet encyclopedia edited by millions of users, sets a prompt classification in a collective fashion. We construct a Wikipedia network for scientific disciplines and extract the backbone of the network. This structure displays a landscape of science and technology that is based on a collective intelligence and that is more unbiased and adaptable than conventional classifications.

Physics and Society Digital Libraries

comments

Fetching comments

Mustansiriyah University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

WikiCSSH: Extracting and Evaluating Computer Science Subject Headings from Wikipedia

Ask ChatGPT about the research

No Arabic abstract

Read More