ترغب بنشر مسار تعليمي؟ اضغط هنا

Finding Quality Issues in SKOS Vocabularies

177   0   0.0 ( 0 )
 نشر من قبل Bernhard Haslhofer
 تاريخ النشر 2012
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

The Simple Knowledge Organization System (SKOS) is a standard model for controlled vocabularies on the Web. However, SKOS vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate how we can support taxonomists in improving SKOS vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SKOS specification. We identified potential quantifiable quality issues and formalized them into computable quality checking functions that can find affected resources in a given SKOS vocabulary. We implemented these functions in the qSKOS quality assessment tool, analyzed 15 existing vocabularies, and found possible quality issues in all of them.



قيم البحث

اقرأ أيضاً

Current science communication has a number of drawbacks and bottlenecks which have been subject of discussion lately: Among others, the rising number of published articles makes it nearly impossible to get a full overview of the state of the art in a certain field, or reproducibility is hampered by fixed-length, document-based publications which normally cannot cover all details of a research work. Recently, several initiatives have proposed knowledge graphs (KG) for organising scientific information as a solution to many of the current issues. The focus of these proposals is, however, usually restricted to very specific use cases. In this paper, we aim to transcend this limited perspective and present a comprehensive analysis of requirements for an Open Research Knowledge Graph (ORKG) by (a) collecting and reviewing daily core tasks of a scientist, (b) establishing their consequential requirements for a KG-based system, (c) identifying overlaps and specificities, and their coverage in current solutions. As a result, we map necessary and desirable requirements for successful KG-based science communication, derive implications, and outline possible solutions.
Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key technology for text mining, and backed this up with a study that showed how particular kinds of lexical tokens---names, dates, locations, etc.---can be identified and located in running text, using compression models to provide the leverage necessary to distinguish different token types (Witten et al., 1999)
Social bookmarking systems allow users to organise collections of resources on the Web in a collaborative fashion. The increasing popularity of these systems as well as first insights into their emergent semantics have made them relevant to disciplin es like knowledge extraction and ontology learning. The problem of devising methods to measure the semantic relatedness between tags and characterizing it semantically is still largely open. Here we analyze three measures of tag relatedness: tag co-occurrence, cosine similarity of co-occurrence distributions, and FolkRank, an adaptation of the PageRank algorithm to folksonomies. Each measure is computed on tags from a large-scale dataset crawled from the social bookmarking system del.icio.us. To provide a semantic grounding of our findings, a connection to WordNet (a semantic lexicon for the English language) is established by mapping tags into synonym sets of WordNet, and applying there well-known metrics of semantic similarity. Our results clearly expose different characteristics of the selected measures of relatedness, making them applicable to different subtasks of knowledge extraction such as synonym detection or discovery of concept hierarchies.
Position bias describes the tendency of users to interact with items on top of a list with higher probability than with items at a lower position in the list, regardless of the items actual relevance. In the domain of recommender systems, particularl y recommender systems in digital libraries, position bias has received little attention. We conduct a study in a real-world recommender system that delivered ten million related-article recommendations to the users of the digital library Sowiport, and the reference manager JabRef. Recommendations were randomly chosen to be shuffled or non-shuffled, and we compared click-through rate (CTR) for each rank of the recommendations. According to our analysis, the CTR for the highest rank in the case of Sowiport is 53% higher than expected in a hypothetical non-biased situation (0.189% vs. 0.123%). Similarly, in the case of Jabref the highest rank received a CTR of 1.276%, which is 87% higher than expected (0.683%). A chi-squared test confirms the strong relationship between the rank of the recommendation shown to the user and whether the user decided to click it (p < 0.01 for both Jabref and Sowiport). Our study confirms the findings from other domains, that recommendations in the top positions are more often clicked, regardless of their actual relevance.
Quantifying the impact of scientific papers objectively is crucial for research output assessment, which subsequently affects institution and country rankings, research funding allocations, academic recruitment and national/international scientific p riorities. While most of the assessment schemes based on publication citations may potentially be manipulated through negative citations, in this study, we explore Conflict of Interest (COI) relationships and discover negative citations and subsequently weaken the associated citation strength. PANDORA (Positive And Negative COI- Distinguished Objective Rank Algorithm) has been developed, which captures the positive and negative COI, together with the positive and negative suspected COI relationships. In order to alleviate the influence caused by negative COI relationship, collaboration times, collaboration time span, citation times and citation time span are employed to determine the citing strength; while for positive COI relationship, we regard it as normal citation relationship. Furthermore, we calculate the impact of scholarly papers by PageRank and HITS algorithms, based on a credit allocation algorithm which is utilized to assess the impact of institutions fairly and objectively. Experiments are conducted on the publication dataset from American Physical Society (APS) dataset, and the results demonstrate that our method significantly outperforms the current solutions in Recommendation Intensity of list R at top-K and Spearmans rank correlation coefficient at top-K.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا