ترغب بنشر مسار تعليمي؟ اضغط هنا

WISER: A Semantic Approach for Expert Finding in Academia based on Entity Linking

67   0   0.0 ( 0 )
 نشر من قبل Marco Ponza
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We present WISER, a new semantic search engine for expert finding in academia. Our system is unsupervised and it jointly combines classical language modeling techniques, based on text evidences, with the Wikipedia Knowledge Graph, via entity linking. WISER indexes each academic author through a novel profiling technique which models her expertise with a small, labeled and weighted graph drawn from Wikipedia. Nodes in this graph are the Wikipedia entities mentioned in the authors publications, whereas the weighted edges express the semantic relatedness among these entities computed via textual and graph-based relatedness functions. Every node is also labeled with a relevance score which models the pertinence of the corresponding entity to authors expertise, and is computed by means of a proper random-walk calculation over that graph; and with a latent vector representation which is learned via entity and other kinds of structural embeddings derived from Wikipedia. At query time, experts are retrieved by combining classic document-centric approaches, which exploit the occurrences of query terms in the authors documents, with a novel set of profile-centric scoring strategies, which compute the semantic relatedness between the authors expertise and the query topic via the above graph-based profiles. The effectiveness of our system is established over a large-scale experimental test on a standard dataset for this task. We show that WISER achieves better performance than all the other competitors, thus proving the effectiveness of modelling authors profile via our semantic graph of entities. Finally, we comment on the use of WISER for indexing and profiling the whole research community within the University of Pisa, and its application to technology transfer in our University.



قيم البحث

اقرأ أيضاً

Much of human knowledge is encoded in text, available in scientific publications, books, and the web. Given the rapid growth of these resources, we need automated methods to extract such knowledge into machine-processable structures, such as knowledg e graphs. An important task in this process is entity normalization, which consists of mapping noisy entity mentions in text to canonical entities in well-known reference sets. However, entity normalization is a challenging problem; there often are many textual forms for a canonical entity that may not be captured in the reference set, and entities mentioned in text may include many syntactic variations, or errors. The problem is particularly acute in scientific domains, such as biology. To address this problem, we have developed a general, scalable solution based on a deep Siamese neural network model to embed the semantic information about the entities, as well as their syntactic variations. We use these embeddings for fast mapping of new entities to large reference sets, and empirically show the effectiveness of our framework in challenging bio-entity normalization datasets.
67 - Huan Gui , Qi Zhu , Liyuan Liu 2018
Expert finding is an important task in both industry and academia. It is challenging to rank candidates with appropriate expertise for various queries. In addition, different types of objects interact with one another, which naturally forms heterogen eous information networks. We study the task of expert finding in heterogeneous bibliographical networks based on two aspects: textual content analysis and authority ranking. Regarding the textual content analysis, we propose a new method for query expansion via locally-trained embedding learning with concept hierarchy as guidance, which is particularly tailored for specific queries with narrow semantic meanings. Compared with global embedding learning, locally-trained embedding learning projects the terms into a latent semantic space constrained on relevant topics, therefore it preserves more precise and subtle information for specific queries. Considering the candidate ranking, the heterogeneous information network structure, while being largely ignored in the previous studies of expert finding, provides additional information. Specifically, different types of interactions among objects play different roles. We propose a ranking algorithm to estimate the authority of objects in the network, treating each strongly-typed edge type individually. To demonstrate the effectiveness of the proposed framework, we apply the proposed method to a large-scale bibliographical dataset with over two million entries and one million researcher candidates. The experiment results show that the proposed framework outperforms existing methods for both general and specific queries.
Trajectory-based spatiotemporal entity linking is to match the same moving object in different datasets based on their movement traces. It is a fundamental step to support spatiotemporal data integration and analysis. In this paper, we study the prob lem of spatiotemporal entity linking using effective and concise signatures extracted from their trajectories. This linking problem is formalized as a k-nearest neighbor (k-NN) query on the signatures. Four representation strategies (sequential, temporal, spatial, and spatiotemporal) and two quantitative criteria (commonality and unicity) are investigated for signature construction. A simple yet effective dimension reduction strategy is developed together with a novel indexing structure called the WR-tree to speed up the search. A number of optimization methods are proposed to improve the accuracy and robustness of the linking. Our extensive experiments on real-world datasets verify the superiority of our approach over the state-of-the-art solutions in terms of both accuracy and efficiency.
Document subject classification is essential for structuring (digital) libraries and allowing readers to search within a specific field. Currently, the classification is typically made by human domain experts. Semi-supervised Machine Learning algorit hms can support them by exploiting the labeled data to predict subject classes for unclassified new documents. However, while humans partly do, machines mostly do not explain the reasons for their decisions. Recently, explainable AI research to address the problem of Machine Learning decisions being a black box has increasingly gained interest. Explainer models have already been applied to the classification of natural language texts, such as legal or medical documents. Documents from Science, Technology, Engineering, and Mathematics (STEM) disciplines are more difficult to analyze, since they contain both textual and mathematical formula content. In this paper, we present first advances towards STEM document classification explainability using classical and mathematical Entity Linking. We examine relationships between textual and mathematical subject classes and entities, mining a collection of documents from the arXiv preprint repository (NTCIR and zbMATH dataset). The results indicate that mathematical entities have the potential to provide high explainability as they are a crucial part of a STEM document.
We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base. We train a dual encoder in this new setting, building on prior work with improved feature representation, ne gative mining, and an auxiliary entity-pairing task, to obtain a single entity retrieval model that covers 100+ languages and 20 million entities. The model outperforms state-of-the-art results from a far more limited cross-lingual linking task. Rare entities and low-resource languages pose challenges at this large-scale, so we advocate for an increased focus on zero- and few-shot evaluation. To this end, we provide Mewsli-9, a large new multilingual dataset (http://goo.gle/mewsli-dataset) matched to our setting, and show how frequency-based analysis provided key insights for our model and training enhancements.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا