ﻻ يوجد ملخص باللغة العربية
Detecting semantic concept of columns in tabular data is of particular interest to many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Recently, several works have proposed supervised learning-based or heuristic pattern-based approaches to semantic type annotation. Both have shortcomings that prevent them from generalizing over a large number of concepts or examples. Many neural network based methods also present scalability issues. Additionally, none of the known methods works well for numerical data. We propose $C^2$, a column to concept mapper that is based on a maximum likelihood estimation approach through ensembles. It is able to effectively utilize vast amounts of, albeit somewhat noisy, openly available table corpora in addition to two popular knowledge graphs to perform effective and efficient concept prediction for structured data. We demonstrate the effectiveness of $C^2$ over available techniques on 9 datasets, the most comprehensive comparison on this topic so far.
This paper presents a novel approach to translating natural language questions to SQL queries for given tables, which meets three requirements as a real-world data analysis application: cross-domain, multilingualism and enabling quick-start. Our prop
Tabular data are ubiquitous for the widespread applications of tables and hence have attracted the attention of researchers to extract underlying information. One of the critical problems in mining tabular data is how to understand their inherent sem
We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing. BRIDGE represents the question and DB schema in a tagged sequence where a subset
Detection of anomalous behaviors in data centers is crucial to predictive maintenance and data safety. With data centers, we mean any computer network that allows users to transmit and exchange data and information. In particular, we focus on the Tie
Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expres