Research papers, master and doctoral theses about rewriting

Geo-BERT Pre-training Model for Query Rewriting in POI Search

174 - Association for Computation Linguistics 2021 مقالة

Query Rewriting (QR) is proposed to solve the problem of the word mismatch between queries and documents in Web search. Existing approaches usually model QR with an end-to-end sequence-to-sequence (seq2seq) model. The state-of-the-art Transformer-bas ed models can effectively learn textual semantics from user session logs, but they often ignore users' geographic location information that is crucial for the Point-of-Interest (POI) search of map services. In this paper, we proposed a pre-training model, called Geo-BERT, to integrate semantics and geographic information in the pre-trained representations of POIs. Firstly, we simulate POI distribution in the real world as a graph, in which nodes represent POIs and multiple geographic granularities. Then we use graph representation learning methods to get geographic representations. Finally, we train a BERT-like pre-training model with text and POIs' graph embeddings to get an integrated representation of both geographic and semantic information, and apply it in the QR of POI search. The proposed model achieves excellent accuracy on a wide range of real-world datasets of map services.

query rewriting rewriting poi search إعادة كتابة البحث البسيط صناعة حمض الفوسفور

RAST: Domain-Robust Dialogue Rewriting as Sequence Tagging

225 - Association for Computation Linguistics 2021 مقالة

The task of dialogue rewriting aims to reconstruct the latest dialogue utterance by copying the missing content from the dialogue context. Until now, the existing models for this task suffer from the robustness issue, i.e., performances drop dramatic ally when testing on a different dataset. We address this robustness issue by proposing a novel sequence-tagging-based model so that the search space is significantly reduced, yet the core of this task is still well covered. As a common issue of most tagging models for text generation, the model's outputs may lack fluency. To alleviate this issue, we inject the loss signal from BLEU or GPT-2 under a REINFORCE framework. Experiments show huge improvements of our model over the current state-of-the-art systems when transferring to another dataset.

domain-robust dialogue rewriting dialogue rewriting aims dialogue rewriting إعادة كتابة الحوار القوي إعادة كتابة الحوار أهداف إعادة كتابة الحوار صناعة حمض الفوسفور المزيد..

Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

266 - Association for Computation Linguistics 2021 مقالة

Paraphrase generation is an important task in natural language processing. Previous works focus on sentence-level paraphrase generation, while ignoring document-level paraphrase generation, which is a more challenging and valuable task. In this paper , we explore the task of document-level paraphrase generation for the first time and focus on the inter-sentence diversity by considering sentence rewriting and reordering. We propose CoRPG (Coherence Relationship guided Paraphrase Generation), which leverages graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence, which can be used for re-arranging the multiple (possibly modified) input sentences. We create a pseudo document-level paraphrase dataset for training CoRPG. Automatic evaluation results show CoRPG outperforms several strong baseline models on the BERTScore and diversity scores. Human evaluation also shows our model can generate document paraphrase with more diversity and semantic preservation.

paraphrase generation document-level paraphrase generation rewriting and reordering إعادة صياغة إعادة صياغة إعادة صياغة صياغة مستوى المستند إعادة كتابة وإعادة ترتيبها صناعة حمض الفوسفور المزيد..

Detect and Perturb: Neutral Rewriting of Biased and Sensitive Text via Gradient-based Decoding

176 - Association for Computation Linguistics 2021 مقالة

Written language carries explicit and implicit biases that can distract from meaningful signals. For example, letters of reference may describe male and female candidates differently, or their writing style may indirectly reveal demographic character istics. At best, such biases distract from the meaningful content of the text; at worst they can lead to unfair outcomes. We investigate the challenge of re-generating input sentences to neutralize' sensitive attributes while maintaining the semantic meaning of the original text (e.g. is the candidate qualified?). We propose a gradient-based rewriting framework, Detect and Perturb to Neutralize (DEPEN), that first detects sensitive components and masks them for regeneration, then perturbs the generation model at decoding time under a neutralizing constraint that pushes the (predicted) distribution of sensitive attributes towards a uniform distribution. Our experiments in two different scenarios show that DEPEN can regenerate fluent alternatives that are neutral in the sensitive attribute while maintaining the semantics of other attributes.

rewriting of biased biased إعادة كتابة منحازة انحيازا صناعة حمض الفوسفور

Open-Domain Question Answering Goes Conversational via Question Rewriting

278 - Association for Computation Linguistics 2021 مقالة

We introduce a new dataset for Question Rewriting in Conversational Context (QReCC), which contains 14K conversations with 80K question-answer pairs. The task in QReCC is to find answers to conversational questions within a collection of 10M web page s (split into 54M passages). Answers to questions in the same conversation may be distributed across several web pages. QReCC provides annotations that allow us to train and evaluate individual subtasks of question rewriting, passage retrieval and reading comprehension required for the end-to-end conversational question answering (QA) task. We report the effectiveness of a strong baseline approach that combines the state-of-the-art model for question rewriting, and competitive models for open-domain QA. Our results set the first baseline for the QReCC dataset with F1 of 19.10, compared to the human upper bound of 75.45, indicating the difficulty of the setup and a large room for improvement.

question rewriting conversational question answering السؤال إعادة كتابة إجابة سؤال المحادثة صناعة حمض الفوسفور

Supertagging-based Parsing with Linear Context-free Rewriting Systems

377 - Association for Computation Linguistics 2021 مقالة

We present the first supertagging-based parser for linear context-free rewriting systems (LCFRS). It utilizes neural classifiers and outperforms previous LCFRS-based parsers in both accuracy and parsing speed by a wide margin. Our results keep up wit h the best (general) discontinuous parsers, particularly the scores for discontinuous constituents establish a new state of the art. The heart of our approach is an efficient lexicalization procedure which induces a lexical LCFRS from any discontinuous treebank. We describe a modification to usual chart-based LCFRS parsing that accounts for supertagging and introduce a procedure that transforms lexical LCFRS derivations into equivalent parse trees of the original treebank. Our approach is evaluated on the English Discontinuous Penn Treebank and the German treebanks Negra and Tiger.

context-free rewriting systems linear context-free rewriting rewriting systems أنظمة إعادة كتابة الخالية من السياق إعادة كتابة خالية من السياق الخطي إعادة كتابة الأنظمة صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد