Term Relevance Feedback for Contextual Named Entity Retrieval

92 0 0.0 ( 0 )

Download Cite

Added by Sheikh Muhammad Sarwar

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Sheikh Muhammad Sarwar - John Foley - James Allan

Information Retrieval

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We address the role of a user in Contextual Named Entity Retrieval (CNER), showing (1) that user identification of important context-bearing terms is superior to automated approaches, and (2) that further gains are possible if the user indicates the relative importance of those terms. CNER is similar in spirit to List Question answering and Entity disambiguation. However, the main focus of CNER is to obtain user feedback for constructing a profile for a class of entities on the fly and use that to retrieve entities from free text. Given a sentence, and an entity selected from that sentence, CNER aims to retrieve sentences that have entities similar to query entity. This paper explores obtaining term relevance feedback and importance weighting from humans in order to improve a CNER system. We report our findings based on the efforts of IR researchers as well as crowdsourced workers.

rate research

Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval

78 - Xiao Wang , Craig Macdonald , Nicola Tonellotto 2021

Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-relevant set. Recently, dense retrieval -- through the use of neural contextual language models such as BERT for analysing the documents and queries contents and computing their relevance scores -- has shown a promising performance on several information retrieval tasks still relying on the traditional inverted index for identifying documents relevant to a query. Two different dense retrieval families have emerged: the use of single embedded representations for each passage and query (e.g. using BERTs [CLS] token), or via multiple representations (e.g. using an embedding for each token of the query and document). In this work, we conduct the first study into the potential for multiple representation dense retrieval to be enhanced using pseudo-relevance feedback. In particular, based on the pseudo-relevant set of documents identified using a first-pass dense retrieval, we extract representative feedback embeddings (using KMeans clustering) -- while ensuring that these embeddings discriminate among passages (based on IDF) -- which are then added to the query representation. These additional feedback embeddings are shown to both enhance the effectiveness of a reranking as well as an additional dense retrieval operation. Indeed, experiments on the MSMARCO passage ranking dataset show that MAP can be improved by upto 26% on the TREC 2019 query set and 10% on the TREC 2020 query set by the application of our proposed ColBERT-PRF method on a ColBERT dense retrieval approach.

Information Retrieval

Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval

66 - Haotian Zhang , Gordon V. Cormack , Maura R. Grossman 2018

This study uses a novel simulation framework to evaluate whether the time and effort necessary to achieve high recall using active learning is reduced by presenting the reviewer with isolated sentences, as opposed to full documents, for relevance feedback. Under the weak assumption that more time and effort is required to review an entire document than a single sentence, simulation results indicate that the use of isolated sentences for relevance feedback can yield comparable accuracy and higher efficiency, relative to the state-of-the-art Baseline Model Implementation (BMI) of the AutoTAR Continuous Active Learning (CAL) method employed in the TREC 2015 and 2016 Total Recall Track.

Information Retrieval

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

141 - HongChien Yu , Chenyan Xiong , Jamie Callan 2021

Dense retrieval systems conduct first-stage retrieval using embedded representations and simple similarity metrics to match a query to documents. Its effectiveness depends on encoded embeddings to capture the semantics of queries and documents, a challenging task due to the shortness and ambiguity of search queries. This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval. ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels. It also keeps the document index unchanged to reduce overhead. ANCE-PRF significantly outperforms ANCE and other recent dense retrieval systems on several datasets. Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.

Information Retrieval Artificial Intelligence

Named Entity Recognition with Extremely Limited Data

143 - John Foley , Sheikh Muhammad Sarwar , James Allan 2018

Traditional information retrieval treats named entity recognition as a pre-indexing corpus annotation task, allowing entity tags to be indexed and used during search. Named entity taggers themselves are typically trained on thousands or tens of thousands of examples labeled by humans. However, there is a long tail of named entities classes, and for these cases, labeled data may be impossible to find or justify financially. We propose exploring named entity recognition as a search task, where the named entity class of interest is a query, and entities of that class are the relevant documents. What should that query look like? Can we even perform NER-style labeling with tens of labels? This study presents an exploration of CRF-based NER models with handcrafted features and of how we might transform them into search queries.

Information Retrieval

Graph-Embedding Empowered Entity Retrieval

321 - Emma J. Gerritse , Faegheh Hasibi , 2020

In this research, we improve upon the current state of the art in entity retrieval by re-ranking the result list using graph embeddings. The paper shows that graph embeddings are useful for entity-oriented search tasks. We demonstrate empirically that encoding information from the knowledge graph into (graph) embeddings contributes to a higher increase in effectiveness of entity retrieval results than using plain word embeddings. We analyze the impact of the accuracy of the entity linker on the overall retrieval effectiveness. Our analysis further deploys the cluster hypothesis to explain the observed advantages of graph embeddings over the more widely used word embeddings, for user tasks involving ranking entities.

Information Retrieval Computation and Language