اكتسبت توليف البيانات لتحليل الدلالي اهتماما متزايدا مؤخرا. ومع ذلك، فإن معظم الطرق تتطلب قواعد يدوية (عالية الدقة) في عملية توليدها، مما يعوق استكشاف بيانات غير مرئية متنوعة. في هذا العمل، نقترح نموذجا عاما يتميز ببرنامج PCFG (غير العصبي) نماذج تكوين البرامج (E.G.، SQL)، ونموذج الترجمة المستندة إلى BART خرائط برنامج إلى كلام. نظرا لبساطة PCFG و BART المدربة مسبقا، يمكن تعلم نموذجنا التوليدي بكفاءة من البيانات الموجودة في متناول اليد. علاوة على ذلك، يؤدي التركيبات النمذجة بشكل صريح باستخدام PCFG إلى استكشاف أفضل لبرامج غير مرئية، وبالتالي توليد بيانات أكثر تنوعا. نقوم بتقييم طريقتنا في كل من الإعدادات داخل المجال والخروج من تحليل النص إلى SQL على المعايير القياسية للجهازية والعنكب العنكبوت، على التوالي. تبين نتائجنا التجريبية أن البيانات المركبة التي تم إنشاؤها من طرازنا يمكن أن تساعد بشكل كبير في محلل الدلالي يحقق تعميم أفضل أو مجال.
Synthesizing data for semantic parsing has gained increasing attention recently. However, most methods require handcrafted (high-precision) rules in their generative process, hindering the exploration of diverse unseen data. In this work, we propose a generative model which features a (non-neural) PCFG that models the composition of programs (e.g., SQL), and a BART-based translation model that maps a program to an utterance. Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand. Moreover, explicitly modeling compositions using PCFG leads to better exploration of unseen programs, thus generate more diverse data. We evaluate our method in both in-domain and out-of-domain settings of text-to-SQL parsing on the standard benchmarks of GeoQuery and Spider, respectively. Our empirical results show that the synthesized data generated from our model can substantially help a semantic parser achieve better compositional and domain generalization.
References used
https://aclanthology.org/
AM dependency parsing is a method for neural semantic graph parsing that exploits the principle of compositionality. While AM dependency parsers have been shown to be fast and accurate across several graphbanks, they require explicit annotations of t
Humans are capable of learning novel concepts from very few examples; in contrast, state-of-the-art machine learning algorithms typically need thousands of examples to do so. In this paper, we propose an algorithm for learning novel concepts by repre
In this paper, we propose a globally normalized model for context-free grammar (CFG)-based semantic parsing. Instead of predicting a probability, our model predicts a real-valued score at each step and does not suffer from the label bias problem. Exp
Semantic parsing aims at translating natural language (NL) utterances onto machine-interpretable programs, which can be executed against a real-world environment. The expensive annotation of utterance-program pairs has long been acknowledged as a maj
The dominant paradigm for semantic parsing in recent years is to formulate parsing as a sequence-to-sequence task, generating predictions with auto-regressive sequence decoders. In this work, we explore an alternative paradigm. We formulate semantic