No Arabic abstract
20 Questions (20Q) is a two-player game. One player is the answerer, and the other is a questioner. The answerer chooses an entity from a specified domain and does not reveal this to the other player. The questioner can ask at most 20 questions to the answerer to guess the entity. The answerer can reply to the questions asked by saying yes/no/maybe. In this paper, we propose a novel approach based on the knowledge graph for designing the 20Q game on Bollywood movies. The system assumes the role of the questioner and asks questions to predict the movie thought by the answerer. It uses a probabilistic learning model for template-based question generation and answers prediction. A dataset of interrelated entities is represented as a weighted knowledge graph, which updates as the game progresses by asking questions. An evolutionary approach helps the model to gain a better understanding of user choices and predicts the answer in fewer questions over time. Experimental results show that our model was able to predict the correct movie in less than 10 questions for more than half of the times the game was played. This kind of model can be used to design applications that can detect diseases by asking questions based on symptoms, improving recommendation systems, etc.
Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of train on English, run on any language, we find through a thorough exploration and extension of techniques that a combination of approaches, both new and old, leads to better performance than any one cross-lingual strategy in particular. We explore techniques including data projection and self-training, and how different pretrained encoders impact them. We use English-to-Arabic IE as our initial example, demonstrating strong performance in this setting for event extraction, named entity recognition, part-of-speech tagging, and dependency parsing. We then apply data projection and self-training to three tasks across eight target languages. Because no single set of techniques performs the best across all tasks, we encourage practitioners to explore various configurations of the techniques described in this work when seeking to improve on zero-shot training.
This paper proposes a discrete knowledge graph (KG) embedding (DKGE) method, which projects KG entities and relations into the Hamming space based on a computationally tractable discrete optimization algorithm, to solve the formidable storage and computation cost challenges in traditional continuous graph embedding methods. The convergence of DKGE can be guaranteed theoretically. Extensive experiments demonstrate that DKGE achieves superior accuracy than classical hashing functions that map the effective continuous embeddings into discrete codes. Besides, DKGE reaches comparable accuracy with much lower computational complexity and storage compared to many continuous graph embedding methods.
There are both technical and social issues regarding the design of sustainable scientific software. Scientists want continuously evolving systems that capture the most recent knowledge while developers and architects want sufficiently stable requirements to ensure correctness and efficiency. A socio-technical ecosystem provides the environment in which these issues can be traded off.
Decision-making usually takes five steps: identifying the problem, collecting data, extracting evidence, identifying pro and con arguments, and making decisions. Focusing on extracting evidence, this paper presents a hybrid model that combines latent Dirichlet allocation and word embeddings to obtain external knowledge from structured and unstructured data. We study the task of sentence-level argument mining, as arguments mostly require some degree of world knowledge to be identified and understood. Given a topic and a sentence, the goal is to classify whether a sentence represents an argument in regard to the topic. We use a topic model to extract topic- and sentence-specific evidence from the structured knowledge base Wikidata, building a graph based on the cosine similarity between the entity word vectors of Wikidata and the vector of the given sentence. Also, we build a second graph based on topic-specific articles found via Google to tackle the general incompleteness of structured knowledge bases. Combining these graphs, we obtain a graph-based model which, as our evaluation shows, successfully capitalizes on both structured and unstructured data.
Visualization recommendation or automatic visualization generation can significantly lower the barriers for general users to rapidly create effective data visualizations, especially for those users without a background in data visualizations. However, existing rule-based approaches require tedious manual specifications of visualization rules by visualization experts. Other machine learning-based approaches often work like black-box and are difficult to understand why a specific visualization is recommended, limiting the wider adoption of these approaches. This paper fills the gap by presenting KG4Vis, a knowledge graph (KG)-based approach for visualization recommendation. It does not require manual specifications of visualization rules and can also guarantee good explainability. Specifically, we propose a framework for building knowledge graphs, consisting of three types of entities (i.e., data features, data columns and visualization design choices) and the relations between them, to model the mapping rules between data and effective visualizations. A TransE-based embedding technique is employed to learn the embeddings of both entities and relations of the knowledge graph from existing dataset-visualization pairs. Such embeddings intrinsically model the desirable visualization rules. Then, given a new dataset, effective visualizations can be inferred from the knowledge graph with semantically meaningful rules. We conducted extensive evaluations to assess the proposed approach, including quantitative comparisons, case studies and expert interviews. The results demonstrate the effectiveness of our approach.