Integrating Knowledge from Latent and Explicit Features for Triple Scoring - Team Radicchios Triple Scorer at WSDM Cup 2017

120 0 0.0 ( 0 )

Download Cite

Added by Liang-Wei Chen

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Liang-Wei Chen - Bhargav Mangipudi - Jayachandu Bandlamudi

Information Retrieval

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The objective of the triple scoring task in WSDM Cup 2017 is to compute relevance scores for knowledge-base triples of type-like relations. For example, consider Julius Caesar who has had various professions, including Politician and Author. For two given triples (Julius Caesar, profession, Politician) and (Julius Caesar, profession, Author), the former triple is likely to have a higher relevance score (also called triple score) because Julius Caesar was well-known as a politician and not as an author. Accurate prediction of such triple scores greatly benefits real-world applications, such as information retrieval or knowledge base query. In these scenarios, being able to rank all relations (Profession/Nationality) can help improve the user experience. We propose a triple scoring model which integrates knowledge from both latent features and explicit features via an ensemble approach. The latent features consist of representations for a person learned by using a word2vec model and representations for profession/nationality values extracted from a pre-trained GloVe embedding model. In addition, we extract explicit features for person entities from the Freebase knowledge base. Experimental results show that the proposed method performs competitively at WSDM Cup 2017, ranking at the third place with an accuracy of 79.72% for predicting within two places of the ground truth score.

rate research

Triple Scoring Using a Hybrid Fact Validation Approach - The Catsear Triple Scorer at WSDM Cup 2017

81 - Edgard Marx 2017

With the continuous increase of data daily published in knowledge bases across the Web, one of the main issues is regarding information relevance. In most knowledge bases, a triple (i.e., a statement composed by subject, predicate, and object) can be only true or false. However, triples can be assigned a score to have information sorted by relevance. In this work, we describe the participation of the Catsear team in the Triple Scoring Challenge at the WSDM Cup 2017. The Catsear approach scores triples by combining the answers coming from three different sources using a linear regression classifier. We show how our approach achieved an Accuracy2 value of 79.58% and the overall 4th place.

Information Retrieval

RelSifter: Scoring Triples from Type-like Relations - The Samphire Triple Scorer at WSDM Cup 2017

84 - Prashant Shiralkar , Mihai Avram , Giovanni Luca Ciampaglia 2017

We present RelSifter, a supervised learning approach to the problem of assigning relevance scores to triples expressing type-like relations such as profession and nationality. To provide additional contextual information about individuals and relations we supplement the data provided as part of the WSDM 2017 Triple Score contest with Wikidata and DBpedia, two large-scale knowledge graphs (KG). Our hypothesis is that any type relation, i.e., a specific profession like actor or scientist, can be described by the set of typical activities of people known to have that type relation. For example, actors are known to star in movies, and scientists are known for their academic affiliations. In a KG, this information is to be found on a properly defined subset of the second-degree neighbors of the type relation. This form of local information can be used as part of a learning algorithm to predict relevance scores for new, unseen triples. When scoring profession and nationality triples our experiments based on this approach result in an accuracy equal to 73% and 78%, respectively. These performance metrics are roughly equivalent or only slightly below the state of the art prior to the present contest. This suggests that our approach can be effective for evaluating facts, despite the skewness in the number of facts per individual mined from KGs.

Information Retrieval

Relevance Scoring of Triples Using Ordinal Logistic Classification - The Celosia Triple Scorer at WSDM Cup 2017

70 - Nausheen Fatma IIIT Hyderabad 2017

In this paper, we report our participation in the Task 2: Triple Scoring of WSDM Cup challenge 2017. In this task, we were provided with triples of type-like relations which were given human-annotated relevance scores ranging from 0 to 7, with 7 being the most relevant and 0 being the least relevant. The task focuses on two such relations: profession and nationality. We built a system which could automatically predict the relevance scores for unseen triples. Our model is primarily a supervised machine learning based one in which we use well-designed features which are used to a make a Logistic Ordinal Regression based classification model. The proposed system achieves an overall accuracy score of 0.73 and Kendalls tau score of 0.36.

Information Retrieval

Supervised Ranking of Triples for Type-Like Relations - The Cress Triple Scorer at the WSDM Cup 2017

46 - Faegheh Hasibi NTNU Trondheim 2017

This paper describes our participation in the Triple Scoring task of WSDM Cup 2017, which aims at ranking triples from a knowledge base for two type-like relations: profession and nationality. We introduce a supervised ranking method along with the features we designed for this task. Our system has been top ranked with respect to average score difference and 2nd best in terms of Kendalls tau.

Information Retrieval

Finding Peoples Professions and Nationalities Using Distant Supervision - The FMI@SU goosefoot team at the WSDM Cup 2017 Triple Scoring Task

158 - Valentin Zmiycharov 2017

We describe the system that our FMI@SU students team built for participating in the Triple Scoring task at the WSDM Cup 2017. Given a triple from a type-like relation, profession or nationality, the goal is to produce a score, on a scale from 0 to 7, that measures the relevance of the statement expressed by the triple: e.g., how well does the profession of an Actor fit for Quentin Tarantino? We propose a distant supervision approach using information crawled from Wikipedia, DeletionPedia, and DBpedia, together with task-specific word embeddings, TF-IDF weights, and role occurrence order, which we combine in a linear regression model. The official evaluation ranked our submission 1st on Kendalls Tau, 7th on Average score difference, and 9th on Accuracy, out of 21 participating teams.

Information Retrieval