تلعب التعبيرات الاصطلاحية (IE) دورا مهما باللغة الطبيعية، وكانت منذ فترة طويلة ألم في الرقبة "لأنظمة NLP.على الرغم من ذلك، تظل مهام توليد النص المتعلقة بالفئرات التي تم استكشافها إلى حد كبير.في هذه الورقة، نقترح اثنين من المهام الجديدة لتوليد الجملة الاصطلاحية وتعويضات لملء هذه الفجوة البحثية.نقدم مجموعة بيانات رائعة من 823 منشأة، وجزعة موازية مع جمل تحتوي عليها ونفس الجمل حيث تم استبدال المنشأ بصلاحياتها الحرفية كمورد أساسي لمهامنا.نقوم بقيادة نماذج التعلم العميق الموجودة، والتي لها أداء حديثة على المهام ذات الصلة باستخدام التقييم الآلي واليدوي مع مجموعة بياناتنا لإلهام المزيد من الأبحاث حول مهامنا المقترحة.من خلال إنشاء نماذج خط الأساس، نحن تمهد الطريق لمزيد من النمذجة الأكثر شمولا ودقيقة من المنشأ، سواء من أجل جيل ومعادلات إعادة الصياغة.
Idiomatic expressions (IE) play an important role in natural language, and have long been a pain in the neck'' for NLP systems. Despite this, text generation tasks related to IEs remain largely under-explored. In this paper, we propose two new tasks of idiomatic sentence generation and paraphrasing to fill this research gap. We introduce a curated dataset of 823 IEs, and a parallel corpus with sentences containing them and the same sentences where the IEs were replaced by their literal paraphrases as the primary resource for our tasks. We benchmark existing deep learning models, which have state-of-the-art performance on related tasks using automated and manual evaluation with our dataset to inspire further research on our proposed tasks. By establishing baseline models, we pave the way for more comprehensive and accurate modeling of IEs, both for generation and paraphrasing.
References used
https://aclanthology.org/
Sentence embeddings encode information relating to the usage of idioms in a sentence. This paper reports a set of experiments that combine a probing methodology with input masking to analyse where in a sentence this idiomatic information is taken fro
cancers (include oral cancers) endanger the human life so they
must be detected, diagnosis early and avoided. More than 90% of these oral
malignancies are squamous cell carcinomas. The prognosis for these
malignancies continues to be poor with approximately 50%survival at five
years.
Kinetoplastid membrane protein-11 (KMP-11), a protein present in all kinetoplastid
protozoa studied up to date, is considered a potential vaccine candidate for Leishmaniasis.
Such vaccine molecules must be expressed in amastigotes which represent t
Generating paragraphs of diverse contents is important in many applications. Existing generation models produce similar contents from homogenized contexts due to the fixed left-to-right sentence order. Our idea is permuting the sentence orders to imp
Despite excellent performance on tasks such as question answering, Transformer-based architectures remain sensitive to syntactic and contextual ambiguities. Question Paraphrasing (QP) offers a promising solution as a means to augment existing dataset