ترغب بنشر مسار تعليمي؟ اضغط هنا

WikiCSSH: Extracting and Evaluating Computer Science Subject Headings from Wikipedia

368   0   0.0 ( 0 )
 نشر من قبل Kanyao Han
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Hierarchical domain-specific classification schemas (or subject heading vocabularies) are often used to identify, classify, and disambiguate concepts that occur in scholarly articles. In this work, we develop, apply, and evaluate a human-in-the-loop workflow that first extracts an initial category tree from crowd-sourced Wikipedia data, and then combines community detection, machine learning, and hand-crafted heuristics or rules to prune the initial tree. This work resulted in WikiCSSH; a large-scale, hierarchically organized vocabulary for the domain of computer science (CS). Our evaluation suggests that WikiCSSH outperforms alternative CS vocabularies in terms of vocabulary size as well as the performance of lexicon-based key-phrase extraction from scholarly data. WikiCSSH can further distinguish between coarse-grained versus fine-grained CS concepts. The outlined workflow can serve as a template for building hierarchically-organized subject heading vocabularies for other domains that are covered in Wikipedia.



قيم البحث

اقرأ أيضاً

113 - Massimo Franceschet 2011
We represent collaboration of authors in computer science papers in terms of both affiliation and collaboration networks and observe how these networks evolved over time since 1960. We investigate the temporal evolution of bibliometric properties, li ke size of the discipline, productivity of scholars, and collaboration level in papers, as well as of large-scale network properties, like reachability and average separation distance among scientists, distribution of the number of scholar collaborators, network clustering and network assortativity by number of collaborators.
191 - Simona Doboli , Fanshu Zhao , 2014
The goal of our research is to understand how ideas propagate, combine and are created in large social networks. In this work, we look at a sample of relevant scientific publications in the area of high-frequency analog circuit design and their citat ion distribution. A novel aspect of our work is the way in which we categorize citations based on the reason and place of it in a publication. We created seven citation categories from general domain references, references to specific methods used in the same domain problem, references to an analysis method, references for experimental comparison and so on. This added information allows us to define two new measures to characterize the creativity (novelty and usefulness) of a publication based on its pattern of citations clustered by reason, place and citing scientific group. We analyzed 30 publications in relevant journals since 2000 and their about 300 citations, all in the area of high-frequency analog circuit design. We observed that the number of citations a publication receives from different scientific groups matches a Levy type distribution: with a large number of groups citing a publication relatively few times, and a very small number of groups citing a publication a large number of times. We looked at the motifs a publication is cited differently by different scientific groups.
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by t heir ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.
306 - Fionn Murtagh 2008
The history of data analysis that is addressed here is underpinned by two themes, -- those of tabular data analysis, and the analysis of collected heterogeneous data. Exploratory data analysis is taken as the heuristic approach that begins with data and information and seeks underlying explanation for what is observed or measured. I also cover some of the evolving context of research and applications, including scholarly publishing, technology transfer and the economic relationship of the university to society.
Systematized subject classification is essential for funding and assessing scientific projects. Conventionally, classification schemes are founded on the empirical knowledge of the group of experts; thus, the experts perspectives have influenced the current systems of scientific classification. Those systems archived the current state-of-art in practice, yet the global effect of the accelerating scientific change over time has made the updating of the classifications system on a timely basis vertually impossible. To overcome the aforementioned limitations, we propose an unbiased classification scheme that takes advantage of collective knowledge; Wikipedia, an Internet encyclopedia edited by millions of users, sets a prompt classification in a collective fashion. We construct a Wikipedia network for scientific disciplines and extract the backbone of the network. This structure displays a landscape of science and technology that is based on a collective intelligence and that is more unbiased and adaptable than conventional classifications.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا