أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Gordon V. Cormack

Evaluating Sentence-Level Relevance Feedback for High-Recall Information Retrieval

66 - Haotian Zhang , Gordon V. Cormack , Maura R. Grossman 2018

This study uses a novel simulation framework to evaluate whether the time and effort necessary to achieve high recall using active learning is reduced by presenting the reviewer with isolated sentences, as opposed to full documents, for relevance fee dback. Under the weak assumption that more time and effort is required to review an entire document than a single sentence, simulation results indicate that the use of isolated sentences for relevance feedback can yield comparable accuracy and higher efficiency, relative to the state-of-the-art Baseline Model Implementation (BMI) of the AutoTAR Continuous Active Learning (CAL) method employed in the TREC 2015 and 2016 Total Recall Track.

استرجاع المعلومات

Efficient and Effective Spam Filtering and Re-ranking for Large Web Datasets

71 - Gordon V. Cormack , Mark D. Smucker , 2010

The TREC 2009 web ad hoc and relevance feedback tasks used a new document collection, the ClueWeb09 dataset, which was crawled from the general Web in early 2009. This dataset contains 1 billion web pages, a substantial fraction of which are spam --- pages designed to deceive search engines so as to deliver an unwanted payload. We examine the effect of spam on the results of the TREC 2009 web ad hoc and relevance feedback tasks, which used the ClueWeb09 dataset. We show that a simple content-based classifier with minimal training is efficient enough to rank the spamminess of every page in the dataset using a standard personal computer in 48 hours, and effective enough to yield significant and substantive improvements in the fixed-cutoff precision (estP10) as well as rank measures (estR-Precision, StatMAP, MAP) of nearly all submitted runs. Moreover, using a set of honeypot queries the labeling of training data may be reduced to an entirely automatic process. The results of classical information retrieval methods are particularly enhanced by filtering --- from among the worst to among the best.

استرجاع المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد