Do you want to publish a course? Click here

PIE: A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing

فطيرة: تعبير تعبير اصطلاحي متوازي عن جيل الجملة الاصطلاحية

264   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Idiomatic expressions (IE) play an important role in natural language, and have long been a pain in the neck'' for NLP systems. Despite this, text generation tasks related to IEs remain largely under-explored. In this paper, we propose two new tasks of idiomatic sentence generation and paraphrasing to fill this research gap. We introduce a curated dataset of 823 IEs, and a parallel corpus with sentences containing them and the same sentences where the IEs were replaced by their literal paraphrases as the primary resource for our tasks. We benchmark existing deep learning models, which have state-of-the-art performance on related tasks using automated and manual evaluation with our dataset to inspire further research on our proposed tasks. By establishing baseline models, we pave the way for more comprehensive and accurate modeling of IEs, both for generation and paraphrasing.



References used
https://aclanthology.org/
rate research

Read More

Sentence embeddings encode information relating to the usage of idioms in a sentence. This paper reports a set of experiments that combine a probing methodology with input masking to analyse where in a sentence this idiomatic information is taken fro m, and what form it takes. Our results indicate that BERT's idiomatic key is primarily found within an idiomatic expression, but also draws on information from the surrounding context. Also, BERT can distinguish between the disruption in a sentence caused by words missing and the incongruity caused by idiomatic usage.
Kinetoplastid membrane protein-11 (KMP-11), a protein present in all kinetoplastid protozoa studied up to date, is considered a potential vaccine candidate for Leishmaniasis. Such vaccine molecules must be expressed in amastigotes which represent t he infective forms for mammals, while promastigotes are the flagellate forms found in the insect hosts. However, the expression of KMP-11 in amastigotes is still a subject of controversy. In this study, a strain of L. tropica was isolated, cultivated, and genotyped. The expression of KMP-11 gene in this strain was evaluated in promastigotes and in amastigotes by RT-PCR using specific primer pairs. The results proved the presence of mRNA of KMP-11 in both promastigotes and amastigotes forms of L. tropica. The expression of this molecule in amastigotes is consistent with the previously demonstrated immunoprotective capacity of KMP-11 DNA vaccine as well as the presence of humoral and cellular immune responses against KMP-11 in Leishmania-infected animals.
Generating paragraphs of diverse contents is important in many applications. Existing generation models produce similar contents from homogenized contexts due to the fixed left-to-right sentence order. Our idea is permuting the sentence orders to imp rove the content diversity of multi-sentence paragraph. We propose a novel framework PermGen whose objective is to maximize the expected log-likelihood of output paragraph distributions with respect to all possible sentence orders. PermGen uses hierarchical positional embedding and designs new procedures for training, and decoding in the sentence-permuted generation. Experiments on three paragraph generation benchmarks demonstrate PermGen generates more diverse outputs with a higher quality than existing models.
Despite excellent performance on tasks such as question answering, Transformer-based architectures remain sensitive to syntactic and contextual ambiguities. Question Paraphrasing (QP) offers a promising solution as a means to augment existing dataset s. The main challenges of current QP models include lack of training data and difficulty in generating diverse and natural questions. In this paper, we present Conquest, a framework for generating synthetic datasets for contextual question paraphrasing. To this end, Conquest first employs an answer-aware question generation (QG) model to create a question-pair dataset and then uses this data to train a contextualized question paraphrasing model. We extensively evaluate Conquest and show its ability to produce more diverse and fluent question pairs than existing approaches. Our contextual paraphrase model also establishes a strong baseline for end-to-end contextual paraphrasing. Further, We find that context can improve BLEU-1 score on contextual compression and expansion by 4.3 and 11.2 respectively, compared to a non-contextual model.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا