في هذه الورقة، نتعامل مع مهمة توليد تعريف (DG) باللغة الصينية، والتي تهدف إلى توليد تعريف تلقائيا لكلمة.معظم الطرق الحالية تأخذ كلمة المصدر كوحدة دلالة لا تسيطر عليها.ومع ذلك، في لغات parataxis مثل الصينية، يمكن أن تتكون معاني الكلمات باستخدام عملية تكوين كلمة، حيث يتم تشكيل كلمة (桃花 ''، إزهار الخوخ) بواسطة مكونات التكوين (桃 ''، الخوخ؛ 花 ''، زهرة)قاعدة تشكيل (رأس المعدل).مستوحاة من هذه العملية، نقترح تعزيز DG مع ميزات تكوين الكلمات.نحن نبني مجموعة بيانات مستنيرة للتشكيل، واقتراح طراز نموذجي، والتي تتحلل الكلمات في ميزات التكوين، تضرب بشكل حيوي ميزات مختلفة من خلال آلية Gating، وتوليد تعريفات الكلمات.تظهر النتائج التجريبية أن طريقتنا فعالة وقوية.
In this paper, we tackle the task of Definition Generation (DG) in Chinese, which aims at automatically generating a definition for a word. Most existing methods take the source word as an indecomposable semantic unit. However, in parataxis languages like Chinese, word meanings can be composed using the word formation process, where a word (桃花'', peach-blossom) is formed by formation components (桃'', peach; 花'', flower) using a formation rule (Modifier-Head). Inspired by this process, we propose to enhance DG with word formation features. We build a formation-informed dataset, and propose a model DeFT, which Decomposes words into formation features, dynamically Fuses different features through a gating mechanism, and generaTes word definitions. Experimental results show that our method is both effective and robust.
References used
https://aclanthology.org/
Large pre-trained language models have repeatedly shown their ability to produce fluent text. Yet even when starting from a prompt, generation can continue in many plausible directions. Current decoding methods with the goal of controlling generation
Few-shot table-to-text generation is a task of composing fluent and faithful sentences to convey table content using limited data. Despite many efforts having been made towards generating impressive fluent sentences by fine-tuning powerful pre-traine
Chinese Spelling Check (CSC) is to detect and correct Chinese spelling errors. Many models utilize a predefined confusion set to learn a mapping between correct characters and its visually similar or phonetically similar misuses but the mapping may b
Amidst rising mental health needs in society, virtual agents are increasingly deployed in counselling. In order to give pertinent advice, counsellors must first gain an understanding of the issues at hand by eliciting sharing from the counsellee. It
Previous works on syntactically controlled paraphrase generation heavily rely on large-scale parallel paraphrase data that is not easily available for many languages and domains. In this paper, we take this research direction to the extreme and inves