A Hybrid Semantic Parsing Approach for Tabular Data Analysis

73 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yan Gao

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yan Gao - Jian-Guang Lou - Dongmei Zhang

الذكاء الاصطناعي الحساب واللغة قواعد البيانات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper presents a novel approach to translating natural language questions to SQL queries for given tables, which meets three requirements as a real-world data analysis application: cross-domain, multilingualism and enabling quick-start. Our proposed approach consists of: (1) a novel data abstraction step before the parser to make parsing table-agnosticism; (2) a set of semantic rules for parsing abstracted data-analysis questions to intermediate logic forms as tree derivations to reduce the search space; (3) a neural-based model as a local scoring function on a span-based semantic parser for structured optimization and efficient inference. Experiments show that our approach outperforms state-of-the-art algorithms on a large open benchmark dataset WikiSQL. We also achieve promising results on a small dataset for more complex queries in both English and Chinese, which demonstrates our language expansion and quick-start ability.

قيم البحث

219 - Xi Victoria Lin , Richard Socher , Caiming Xiong 2020

We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing. BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. The hybrid sequence is encoded by BERT with minimal subsequent layers and the text-DB contextualization is realized via the fine-tuned deep attention in BERT. Combined with a pointer-generator decoder with schema-consistency driven search space pruning, BRIDGE attained state-of-the-art performance on popular cross-DB text-to-SQL benchmarks, Spider (71.1% dev, 67.5% test with ensemble model) and WikiSQL (92.6% dev, 91.9% test). Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks. Our implementation is available at url{https://github.com/salesforce/TabularSemanticParsing}.

الحساب واللغة الذكاء الاصطناعي قواعد البيانات

Semantic Annotation for Tabular Data

88 - Udayan Khurana , Sainyam Galhotra 2020

Detecting semantic concept of columns in tabular data is of particular interest to many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Recently, several works have proposed supervised learning-based or heuristic pattern-based approaches to semantic type annotation. Both have shortcomings that prevent them from generalizing over a large number of concepts or examples. Many neural network based methods also present scalability issues. Additionally, none of the known methods works well for numerical data. We propose $C^2$, a column to concept mapper that is based on a maximum likelihood estimation approach through ensembles. It is able to effectively utilize vast amounts of, albeit somewhat noisy, openly available table corpora in addition to two popular knowledge graphs to perform effective and efficient concept prediction for structured data. We demonstrate the effectiveness of $C^2$ over available techniques on 9 datasets, the most comprehensive comparison on this topic so far.

الذكاء الاصطناعي قواعد البيانات

Towards Compositional Distributional Discourse Analysis

325 - Bob Coecke 2018

Categorical compositional distributional semantics provide a method to derive the meaning of a sentence from the meaning of its individual words: the grammatical reduction of a sentence automatically induces a linear map for composing the word vector s obtained from distributional semantics. In this paper, we extend this passage from word-to-sentence to sentence-to-discourse composition. To achieve this we introduce a notion of basic anaphoric discourses as a mid-level representation between natural language discourse formalised in terms of basic discourse representation structures (DRS); and knowledge base queries over the Semantic Web as described by basic graph patterns in the Resource Description Framework (RDF). This provides a high-level specification for compositional algorithms for question answering and anaphora resolution, and allows us to give a picture of natural language understanding as a process involving both statistical and logical resources.

الذكاء الاصطناعي الحساب واللغة قواعد البيانات

Commonsense Knowledge Base Construction in the Age of Big Data

166 - Simon Razniewski 2021

Compiling commonsense knowledge is traditionally an AI topic approached by manual labor. Recent advances in web data processing have enabled automated approaches. In this demonstration we will showcase three systems for automated commonsense knowledg e base construction, highlighting each time one aspect of specific interest to the data management community. (i) We use Quasimodo to illustrate knowledge extraction systems engineering, (ii) Dice to illustrate the role that schema constraints play in cleaning fuzzy commonsense knowledge, and (iii) Ascent to illustrate the relevance of conceptual modelling. The demos are available online at https://quasimodo.r2.enst.fr, https://dice.mpi-inf.mpg.de and ascent.mpi-inf.mpg.de.

الذكاء الاصطناعي الحساب واللغة قواعد البيانات

Data Recombination for Neural Semantic Parsing

188 - Robin Jia , Percy Liang 2016

Modeling crisp logical regularities is crucial in semantic parsing, making it difficult for neural models with no task-specific prior knowledge to achieve good results. In this paper, we introduce data recombination, a novel framework for injecting s uch prior knowledge into a model. From the training data, we induce a high-precision synchronous context-free grammar, which captures important conditional independence properties commonly found in semantic parsing. We then train a sequence-to-sequence recurrent network (RNN) model with a novel attention-based copying mechanism on datapoints sampled from this grammar, thereby teaching the model about these structural properties. Data recombination improves the accuracy of our RNN model on three semantic parsing datasets, leading to new state-of-the-art performance on the standard GeoQuery dataset for models with comparable supervision.

الحساب واللغة