No Arabic abstract
The task of expert finding has been getting increasing attention in information retrieval literature. However, the current state-of-the-art is still lacking in principled approaches for combining different sources of evidence in an optimal way. This paper explores the usage of learning to rank methods as a principled approach for combining multiple estimators of expertise, derived from the textual contents, from the graph-structure with the citation patterns for the community of experts, and from profile information about the experts. Experiments made over a dataset of academic publications, for the area of Computer Science, attest for the adequacy of the proposed approaches.
Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe peoples activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state-of-the-art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph-structure of the citation patterns for the community of experts, and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combing all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state-of-the-art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.
Autocomplete (a.k.a Query Auto-Completion, AC) suggests full queries based on a prefix typed by customer. Autocomplete has been a core feature of commercial search engine. In this paper, we propose a novel context-aware neural network based pairwise ranker (DeepPLTR) to improve AC ranking, DeepPLTR leverages contextual and behavioral features to rank queries by minimizing a pairwise loss, based on a fully-connected neural network structure. Compared to LambdaMART ranker, DeepPLTR shows +3.90% MeanReciprocalRank (MRR) lift in offline evaluation, and yielded +0.06% (p < 0.1) Gross Merchandise Value (GMV) lift in an Amazons online A/B experiment.
One way to assess a certain aspect of the value of scientific research is to measure the attention it receives on social media. While previous research has mostly focused on the number of mentions of scientific research on social media, the current study applies topic networks to measure public attention to scientific research on Twitter. Topic networks are the networks of co-occurring author keywords in scholarly publications and networks of co-occurring hashtags in the tweets mentioning those scholarly publications. This study investigates which topics in opioid scholarly publications have received public attention on Twitter. Additionally, it investigates whether the topic networks generated from the publications tweeted by all accounts (bot and non-bot accounts) differ from those generated by non-bot accounts. Our analysis is based on a set of opioid scholarly publications from 2011 to 2019 and the tweets associated with them. We use co-occurrence network analysis to generate topic networks. Results indicated that Twitter users have mostly used generic terms to discuss opioid publications, such as Opioid, Pain, Addiction, Treatment, Analgesics, Abuse, Overdose, and Disorders. Results confirm that topic networks provide a legitimate method to visualize public discussions of health-related scholarly publications and how Twitter users discuss health-related scientific research differently from the scientific community. There was a substantial overlap between the topic networks based on the tweets by all accounts and non-bot accounts. This result indicates that it might not be necessary to exclude bot accounts for generating topic networks as they have a negligible impact on the results.
We implemented and evaluated a two-stage retrieval method for personalized academic search in which the initial search results are re-ranked using an author-topic profile. In academic search tasks, the users own data can help optimizing the ranking of search results to match the searchers specific individual needs. The author-topic profile consists of topic-specific terms, stored in a graph. We re-rank the top-1000 retrieved documents using ten features that represent the similarity between the document and the author-topic graph. We found that the re-ranking gives a small but significant improvement over the reproduced best method from the literature. Storing the profile as a graph has a number of advantages: it is flexible with respect to node and relation types; it is a visualization of knowledge that is interpretable by the user, and it offers the possibility to view relational characteristics of individual nodes.
Digital mathematical libraries (DMLs) such as arXiv, Numdam, and EuDML contain mainly documents from STEM fields, where mathematical formulae are often more important than text for understanding. Conventional information retrieval (IR) systems are unable to represent formulae and they are therefore ill-suited for math information retrieval (MIR). To fill the gap, we have developed, and open-sourced the MIaS MIR system. MIaS is based on the full-text search engine Apache Lucene. On top of text retrieval, MIaS also incorporates a set of tools for preprocessing mathematical formulae. We describe the design of the system and present speed, and quality evaluation results. We show that MIaS is both efficient, and effective, as evidenced by our victory in the NTCIR-11 Math-2 task.