New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Semi-Automatic Construction of Text-to-SQL Data for Domain Transfer

تشييد نصف تلقائي لبيانات نص إلى SQL لنقل النطاق

413 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

domain transfer sql queries sql نقل المجال استفسارات SQL. مقدم SQL. صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Strong and affordable in-domain data is a desirable asset when transferring trained semantic parsers to novel domains. As previous methods for semi-automatically constructing such data cannot handle the complexity of realistic SQL queries, we propose to construct SQL queries via context-dependent sampling, and introduce the concept of topic. Along with our SQL query construction method, we propose a novel pipeline of semi-automatic Text-to-SQL dataset construction that covers the broad space of SQL queries. We show that the created dataset is comparable with expert annotation along multiple dimensions, and is capable of improving domain transfer performance for SOTA semantic parsers.

References used

https://aclanthology.org/

rate research

Structure-Grounded Pretraining for Text-to-SQL

322 - Association for Computation Linguistics 2021 مقالة

Learning to capture text-table alignment is essential for tasks like text-to-SQL. A model needs to correctly recognize natural language references to columns and values and to ground them in the given database schema. In this paper, we present a nove l weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus. We identify a set of novel pretraining tasks: column grounding, value grounding and column-value mapping, and leverage them to pretrain a text-table encoder. Additionally, to evaluate different methods under more realistic text-table alignment settings, we create a new evaluation set Spider-Realistic based on Spider dev set with explicit mentions of column names removed, and adopt eight existing text-to-SQL datasets for cross-database evaluation. STRUG brings significant improvement over BERTLARGE in all settings. Compared with existing pretraining methods such as GRAPPA, STRUG achieves similar performance on Spider, and outperforms all baselines on more realistic sets. All the code and data used in this work will be open-sourced to facilitate future research.

capture text-table alignment text-table alignment text-table التقاط محاذاة جدول النص محاذاة جدول النص نص نص صناعة حمض الفوسفور المزيد..

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

336 - Association for Computation Linguistics 2021 مقالة

Most available semantic parsing datasets, comprising of pairs of natural utterances and logical forms, were collected solely for the purpose of training and evaluation of natural language understanding systems. As a result, they do not contain any of the richness and variety of natural-occurring utterances, where humans ask about data they need or are curious about. In this work, we release SEDE, a dataset with 12,023 pairs of utterances and SQL queries collected from real usage on the Stack Exchange website. We show that these pairs contain a variety of real-world challenges which were rarely reflected so far in any other semantic parsing dataset, propose an evaluation metric based on comparison of partial query clauses that is more suitable for real-world queries, and conduct experiments with strong baselines, showing a large gap between the performance on SEDE compared to other common datasets.

stack exchange data naturally-occurring dataset based stack exchange بيانات التبادل المكدس لحالات البيانات التي تحدث بشكل طبيعي كومة البورصة صناعة حمض الفوسفور المزيد..

DuoRAT: Towards Simpler Text-to-SQL Models

488 - Association for Computation Linguistics 2021 مقالة

Recent neural text-to-SQL models can effectively translate natural language questions to corresponding SQL queries on unseen databases. Working mostly on the Spider dataset, researchers have proposed increasingly sophisticated solutions to the proble m. Contrary to this trend, in this paper we focus on simplifications. We begin by building DuoRAT, a re-implementation of the state-of-the-art RAT-SQL model that unlike RAT-SQL is using only relation-aware or vanilla transformers as the building blocks. We perform several ablation experiments using DuoRAT as the baseline model. Our experiments confirm the usefulness of some techniques and point out the redundancy of others, including structural SQL features and features that link the question with the schema.

simpler effectively translate natural translate natural language أبسط ترجمة فعالة الطبيعية ترجمة اللغة الطبيعية صناعة حمض الفوسفور المزيد..

Controllable Sentence Simplification with a Unified Text-to-Text Transfer Transformer

504 - Association for Computation Linguistics 2021 مقالة

Recently, a large pre-trained language model called T5 (A Unified Text-to-Text Transfer Transformer) has achieved state-of-the-art performance in many NLP tasks. However, no study has been found using this pre-trained model on Text Simplification. Th erefore in this paper, we explore the use of T5 fine-tuning on Text Simplification combining with a controllable mechanism to regulate the system outputs that can help generate adapted text for different target audiences. Our experiments show that our model achieves remarkable results with gains of between +0.69 and +1.41 over the current state-of-the-art (BART+ACCESS). We argue that using a pre-trained model such as T5, trained on several tasks with large amounts of data, can help improve Text Simplification.

فك التشفير العاطفي controllable sentence simplification sentence simplification تبسيط الجملة القابلة للتحكم تبسيط الجملة صناعة حمض الفوسفور

TransPrompt: Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

371 - Association for Computation Linguistics 2021 مقالة

Recent studies have shown that prompts improve the performance of large pre-trained language models for few-shot text classification. Yet, it is unclear how the prompting knowledge can be transferred across similar NLP tasks for the purpose of mutual reinforcement. Based on continuous prompt embeddings, we propose TransPrompt, a transferable prompting framework for few-shot learning across similar tasks. In TransPrompt, we employ a multi-task meta-knowledge acquisition procedure to train a meta-learner that captures cross-task transferable knowledge. Two de-biasing techniques are further designed to make it more task-agnostic and unbiased towards any tasks. After that, the meta-learner can be adapted to target tasks with high accuracy. Extensive experiments show that TransPrompt outperforms single-task and cross-task strong baselines over multiple NLP tasks and datasets. We further show that the meta-learner can effectively improve the performance on previously unseen tasks; and TransPrompt also outperforms strong fine-tuning baselines when learning with full training sets.

شبكة مزدوجة متزامن automatic transferable prompting التحويل التلقائي المطالبة صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Semi-Automatic Construction of Text-to-SQL Data for Domain Transfer

تشييد نصف تلقائي لبيانات نص إلى SQL لنقل النطاق

Ask ChatGPT about the research

Read More

suggested questions