ترغب بنشر مسار تعليمي؟ اضغط هنا

Using Semantic Role Knowledge for Relevance Ranking of Key Phrases in Documents: An Unsupervised Approach

151   0   0.0 ( 0 )
 نشر من قبل Neelmadhav Gantayat
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper, we investigate the integration of sentence position and semantic role of words in a PageRank system to build a key phrase ranking method. We present the evaluation results of our approach on three scientific articles. We show that semantic role information, when integrated with a PageRank system, can become a new lexical feature. Our approach had an overall improvement on all the data sets over the state-of-art baseline approaches.



قيم البحث

اقرأ أيضاً

Literature search is critical for any scientific research. Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types. Entity- set queries reflect users need for finding documents that contain multiple entities and reveal inter-entity relationships and thus pose non-trivial challenges to existing search algorithms that model each entity separately. However, entity-set queries are usually sparse (i.e., not so repetitive), which makes ineffective many supervised ranking models that rely heavily on associated click history. To address these challenges, we introduce SetRank, an unsupervised ranking framework that models inter-entity relationships and captures entity type information. Furthermore, we develop a novel unsupervised model selection algorithm, based on the technique of weighted rank aggregation, to automatically choose the parameter settings in SetRank without resorting to a labeled validation set. We evaluate our proposed unsupervised approach using datasets from TREC Genomics Tracks and Semantic Scholars query log. The experiments demonstrate that SetRank significantly outperforms the baseline unsupervised models, especially on entity-set queries, and our model selection algorithm effectively chooses suitable parameter settings.
85 - Lin Bo , Liang Pang , Gang Wang 2021
Recently, pre-trained language models such as BERT have been applied to document ranking for information retrieval, which first pre-train a general language model on an unlabeled large corpus and then conduct ranking-specific fine-tuning on expert-la beled relevance datasets. Ideally, an IR system would model relevance from a user-system dualism: the users view and the systems view. Users view judges the relevance based on the activities of real users while the systems view focuses on the relevance signals from the system side, e.g., from the experts or algorithms, etc. Inspired by the user-system relevance views and the success of pre-trained language models, in this paper we propose a novel ranking framework called Pre-Rank that takes both users view and systems view into consideration, under the pre-training and fine-tuning paradigm. Specifically, to model the users view of relevance, Pre-Rank pre-trains the initial query-document representations based on large-scale user activities data such as the click log. To model the systems view of relevance, Pre-Rank further fine-tunes the model on expert-labeled relevance data. More importantly, the pre-trained representations, are fine-tuned together with handcrafted learning-to-rank features under a wide and deep network architecture. In this way, Pre-Rank can model the relevance by incorporating the relevant knowledge and signals from both real search users and the IR experts. To verify the effectiveness of Pre-Rank, we showed two implementations by using BERT and SetRank as the underlying ranking model, respectively. Experimental results base on three publicly available benchmarks showed that in both of the implementations, Pre-Rank can respectively outperform the underlying ranking models and achieved state-of-the-art performances.
Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by this, we present Semantic Hashing with Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. PairRec first encodes weakly supervised training pairs (a query document and a semantically similar document) into two hash codes, and then learns to reconstruct the same query document from both of these hash codes (i.e., pairwise reconstruction). This pairwise reconstruction enables our model to encode local neighbourhood structures within the hash code directly through the decoder. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.
In this paper we propose a new document classification method, bridging discrepancies (so-called semantic gap) between the training set and the application sets of textual data. We demonstrate its superiority over classical text classification approa ches, including traditional classifier ensembles. The method consists in combining a document categorization technique with a single classifier or a classifier ensemble (SEMCOM algorithm - Committee with Semantic Categorizer).
Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases. These metrics limit the VidQA models application scenario. In this work, we leverage semantic roles deriv ed from video descriptions to mask out certain phrases, to introduce VidQAP which poses VidQA as a fill-in-the-phrase task. To enable evaluation of answer phrases, we compute the relative improvement of the predicted answer compared to an empty string. To reduce the influence of language bias in VidQA datasets, we retrieve a video having a different answer for the same question. To facilitate research, we construct ActivityNet-SRL-QA and Charades-SRL-QA and benchmark them by extending three vision-language models. We further perform extensive analysis and ablative studies to guide future work.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا