New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Structure-Grounded Pretraining for Text-to-SQL

هيكل - محاولات محاكاة للنص إلى SQL

317 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

يعد تعلم محاذاة جدول النص أمرا ضروريا للمهام مثل النص إلى SQL. يحتاج النموذج إلى التعرف بشكل صحيح على مراجع اللغة الطبيعية إلى الأعمدة والقيم وإيصارها في مخطط قاعدة البيانات المحدد. في هذه الورقة، نقدم رواية خاضعة للإشراف على أساس إشراف الإشراف على إنشاء هيكل (Stred) للنص إلى SQL والتي يمكن أن تتعلم بفعالية لالتقاط محاذاة جدول النصوص بناء على كوربوس نصي متوازي للنص. نحدد مجموعة من المهام التي تحذر الرواية: تأريض العمود، والتأريض القيمة ورسم الخرائط ذات القيمة العمودية، والاستفادة منهم للتأمر بتشمس الجدول النصي. بالإضافة إلى ذلك، لتقييم الأساليب المختلفة في إطار إعدادات محاذاة النصوص النصية أكثر واقعية، نقوم بإنشاء تقييم جديد تم تعيين العنكبوت على أساس مجموعة ديف العنكبوت مع إزالته الصريحة لأسماء الأعمدة التي تمت إزالتها، واعتماد ثمانية مجموعات بيانات نصية إلى SQL الحالية تقييم قاعدة البيانات. Werug يجلب تحسنا كبيرا على Bertlarge في جميع الإعدادات. بالمقارنة مع طرق الاحتجاج الحالية مثل Grappa، تحقق Strech أداء مماثل على العنكبوت، وتتفوق على جميع خطوط الأساس على مجموعات أكثر واقعية. سيكون جميع التعليمات البرمجية والبيانات المستخدمة في هذا العمل مفتوحة لتسهيل البحث في المستقبل.

Learning to capture text-table alignment is essential for tasks like text-to-SQL. A model needs to correctly recognize natural language references to columns and values and to ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus. We identify a set of novel pretraining tasks: column grounding, value grounding and column-value mapping, and leverage them to pretrain a text-table encoder. Additionally, to evaluate different methods under more realistic text-table alignment settings, we create a new evaluation set Spider-Realistic based on Spider dev set with explicit mentions of column names removed, and adopt eight existing text-to-SQL datasets for cross-database evaluation. STRUG brings significant improvement over BERTLARGE in all settings. Compared with existing pretraining methods such as GRAPPA, STRUG achieves similar performance on Spider, and outperforms all baselines on more realistic sets. All the code and data used in this work will be open-sourced to facilitate future research.

References used

https://aclanthology.org/

rate research

Semi-Automatic Construction of Text-to-SQL Data for Domain Transfer

399 - Association for Computation Linguistics 2021 مقالة

Strong and affordable in-domain data is a desirable asset when transferring trained semantic parsers to novel domains. As previous methods for semi-automatically constructing such data cannot handle the complexity of realistic SQL queries, we propose to construct SQL queries via context-dependent sampling, and introduce the concept of topic. Along with our SQL query construction method, we propose a novel pipeline of semi-automatic Text-to-SQL dataset construction that covers the broad space of SQL queries. We show that the created dataset is comparable with expert annotation along multiple dimensions, and is capable of improving domain transfer performance for SOTA semantic parsers.

domain transfer sql queries sql نقل المجال استفسارات SQL. مقدم SQL. صناعة حمض الفوسفور المزيد..

DuoRAT: Towards Simpler Text-to-SQL Models

483 - Association for Computation Linguistics 2021 مقالة

Recent neural text-to-SQL models can effectively translate natural language questions to corresponding SQL queries on unseen databases. Working mostly on the Spider dataset, researchers have proposed increasingly sophisticated solutions to the proble m. Contrary to this trend, in this paper we focus on simplifications. We begin by building DuoRAT, a re-implementation of the state-of-the-art RAT-SQL model that unlike RAT-SQL is using only relation-aware or vanilla transformers as the building blocks. We perform several ablation experiments using DuoRAT as the baseline model. Our experiments confirm the usefulness of some techniques and point out the redundancy of others, including structural SQL features and features that link the question with the schema.

simpler effectively translate natural translate natural language أبسط ترجمة فعالة الطبيعية ترجمة اللغة الطبيعية صناعة حمض الفوسفور المزيد..

Natural SQL: Making SQL Easier to Infer from Natural Language Specifications

425 - Association for Computation Linguistics 2021 مقالة

Addressing the mismatch between natural language descriptions and the corresponding SQL queries is a key challenge for text-to-SQL translation. To bridge this gap, we propose an SQL intermediate representation (IR) called Natural SQL (NatSQL). Specif ically, NatSQL preserves the core functionalities of SQL, while it simplifies the queries as follows: (1) dispensing with operators and keywords such as GROUP BY, HAVING, FROM, JOIN ON, which are usually hard to find counterparts in the text descriptions; (2) removing the need of nested subqueries and set operators; and (3) making the schema linking easier by reducing the required number of schema items. On Spider, a challenging text-to-SQL benchmark that contains complex and nested SQL queries, we demonstrate that NatSQL outperforms other IRs, and significantly improves the performance of several previous SOTA models. Furthermore, for existing models that do not support executable SQL generation, NatSQL easily enables them to generate executable SQL queries, and achieves the new state-of-the-art execution accuracy.

natural language specifications language specifications making sql easier مواصفات اللغة الطبيعية مواصفات اللغة جعل SQL أسهل صناعة حمض الفوسفور المزيد..

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

330 - Association for Computation Linguistics 2021 مقالة

Most available semantic parsing datasets, comprising of pairs of natural utterances and logical forms, were collected solely for the purpose of training and evaluation of natural language understanding systems. As a result, they do not contain any of the richness and variety of natural-occurring utterances, where humans ask about data they need or are curious about. In this work, we release SEDE, a dataset with 12,023 pairs of utterances and SQL queries collected from real usage on the Stack Exchange website. We show that these pairs contain a variety of real-world challenges which were rarely reflected so far in any other semantic parsing dataset, propose an evaluation metric based on comparison of partial query clauses that is more suitable for real-world queries, and conduct experiments with strong baselines, showing a large gap between the performance on SEDE compared to other common datasets.

stack exchange data naturally-occurring dataset based stack exchange بيانات التبادل المكدس لحالات البيانات التي تحدث بشكل طبيعي كومة البورصة صناعة حمض الفوسفور المزيد..

NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task

448 - Association for Computation Linguistics 2021 مقالة

This paper describes NAIST's system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign. Our primary submission is based on wait-k neural machine translation with sequence-level knowledge distillation to encourage literal translation.

simultaneous translation system describes naist system naist system نظام الترجمة في وقت واحد يصف نظام NAIS نظام naist صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Structure-Grounded Pretraining for Text-to-SQL

هيكل - محاولات محاكاة للنص إلى SQL

Ask ChatGPT about the research

Read More

suggested questions