ترغب بنشر مسار تعليمي؟ اضغط هنا

Part of Speech Based Term Weighting for Information Retrieval

134   0   0.0 ( 0 )
 نشر من قبل Christina Lioma Assoc. Prof
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Automatic language processing tools typically assign to terms so-called weights corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the POS contexts in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF & BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline). Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline.

قيم البحث

اقرأ أيضاً

Most neural Information Retrieval (Neu-IR) models derive query-to-document ranking scores based on term-level matching. Inspired by TileBars, a classical term distribution visualization method, in this paper, we propose a novel Neu-IR model that hand les query-to-document matching at the subtopic and higher levels. Our system first splits the documents into topical segments, visualizes the matchings between the query and the segments, and then feeds an interaction matrix into a Neu-IR model, DeepTileBars, to obtain the final ranking scores. DeepTileBars models the relevance signals occurring at different granularities in a documents topic hierarchy. It better captures the discourse structure of a document and thus the matching patterns. Although its design and implementation are light-weight, DeepTileBars outperforms other state-of-the-art Neu-IR models on benchmark datasets including the Text REtrieval Conference (TREC) 2010-2012 Web Tracks and LETOR 4.0.
154 - Haggai Roitman 2019
This work presents a general query term weighting approach based on query performance prediction (QPP). To this end, a given term is weighed according to its predicted effect on query performance. Such an effect is assumed to be manifested in the res ponses made by the underlying retrieval method for the original query and its (simple) variants in the form of a single-term expanded query. Focusing on search re-ranking as the underlying application, the effectiveness of the proposed term weighting approach is demonstrated using several state-of-the-art QPP methods evaluated over TREC corpora.
This report describes metrics for the evaluation of the effectiveness of segment-based retrieval based on existing binary information retrieval metrics. This metrics are described in the context of a task for the hyperlinking of video segments. This evaluation approach re-uses existing evaluation measures from the standard Cranfield evaluation paradigm. Our adaptation approach can in principle be used with any kind of effectiveness measure that uses binary relevance, and for other segment-baed retrieval tasks. In our video hyperlinking setting, we use precision at a cut-off rank n and mean average precision.
TextRank is a variant of PageRank typically used in graphs that represent documents, and where vertices denote terms and edges denote relations between terms. Quite often the relation between terms is simple term co-occurrence within a fixed window o f k terms. The output of TextRank when applied iteratively is a score for each vertex, i.e. a term weight, that can be used for information retrieval (IR) just like conventional term frequency based term weights. So far, when computing TextRank term weights over co- occurrence graphs, the window of term co-occurrence is al- ways ?xed. This work departs from this, and considers dy- namically adjusted windows of term co-occurrence that fol- low the document structure on a sentence- and paragraph- level. The resulting TextRank term weights are used in a ranking function that re-ranks 1000 initially returned search results in order to improve the precision of the ranking. Ex- periments with two IR collections show that adjusting the vicinity of term co-occurrence when computing TextRank term weights can lead to gains in early precision.
220 - Zeeshan Ahmed 2011
PDM Systems contain and manage heavy amount of data but the search mechanism of most of the systems is not intelligent which can process users natural language based queries to extract desired information. Currently available search mechanisms in alm ost all of the PDM systems are not very efficient and based on old ways of searching information by entering the relevant information to the respective fields of search forms to find out some specific information from attached repositories. Targeting this issue, a thorough research was conducted in fields of PDM Systems and Language Technology. Concerning the PDM System, conducted research provides the information about PDM and PDM Systems in detail. Concerning the field of Language Technology, helps in implementing a search mechanism for PDM Systems to search users needed information by analyzing users natural language based requests. The accomplished goal of this research was to support the field of PDM with a new proposition of a conceptual model for the implementation of natural language based search. The proposed conceptual model is successfully designed and partially implementation in the form of a prototype. Describing the proposition in detail the main concept, implementation designs and developed prototype of proposed approach is discussed in this paper. Implemented prototype is compared with respective functions of existing PDM systems .i.e., Windchill and CIM to evaluate its effectiveness against targeted challenges.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا