ﻻ يوجد ملخص باللغة العربية
Similar question retrieval is a core task in community-based question answering (CQA) services. To balance the effectiveness and efficiency, the question retrieval system is typically implemented as multi-stage rankers: The first-stage ranker aims to recall potentially relevant questions from a large repository, and the latter stages attempt to re-rank the retrieved results. Most existing works on question retrieval mainly focused on the re-ranking stages, leaving the first-stage ranker to some traditional term-based methods. However, term-based methods often suffer from the vocabulary mismatch problem, especially on short texts, which may block the re-rankers from relevant questions at the very beginning. An alternative is to employ embedding-based methods for the first-stage ranker, which compress texts into dense vectors to enhance the semantic matching. However, these methods often lose the discriminative power as term-based methods, thus introduce noise during retrieval and hurt the recall performance. In this work, we aim to tackle the dilemma of the first-stage ranker, and propose a discriminative semantic ranker, namely DenseTrans, for high-recall retrieval. Specifically, DenseTrans is a densely connected Transformer, which learns semantic embeddings for texts based on Transformer layers. Meanwhile, DenseTrans promotes low-level features through dense connections to keep the discriminative power of the learned representations. DenseTrans is inspired by DenseNet in computer vision (CV), but poses a new way to use the dense connectivity which is totally different from its original design purpose. Experimental results over two question retrieval benchmark datasets show that our model can obtain significant gain on recall against strong term-based methods as well as state-of-the-art embedding-based methods.
Pre-trained model such as BERT has been proved to be an effective tool for dealing with Information Retrieval (IR) problems. Due to its inspiring performance, it has been widely used to tackle with real-world IR problems such as document ranking. Rec
Conversational passage retrieval relies on question rewriting to modify the original question so that it no longer depends on the conversation history. Several methods for question rewriting have recently been proposed, but they were compared under d
Search engine has become a fundamental component in various web and mobile applications. Retrieving relevant documents from the massive datasets is challenging for a search engine system, especially when faced with verbose or tail queries. In this pa
In this work, we address multi-modal information needs that contain text questions and images by focusing on passage retrieval for outside-knowledge visual question answering. This task requires access to outside knowledge, which in our case we defin
Building machines that can understand text like humans is an AI-complete problem. A great deal of research has already gone into this, with astounding results, allowing everyday people to discuss with their telephones, or have their reading materials