Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

LOCATER: Cleaning WiFi Connectivity Datasets for Semantic Localization

138 0 0.0 ( 0 )

Download Cite

Added by Yiming Lin

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Yiming Lin - Daokun Jiang - Roberto Yus

Databases Networking and Internet Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper explores the data cleaning challenges that arise in using WiFi connectivity data to locate users to semantic indoor locations such as buildings, regions, rooms. WiFi connectivity data consists of sporadic connections between devices and nearby WiFi access points (APs), each of which may cover a relatively large area within a building. Our system, entitled semantic LOCATion cleanER (LOCATER), postulates semantic localization as a series of data cleaning tasks - first, it treats the problem of determining the AP to which a device is connected between any two of its connection events as a missing value detection and repair problem. It then associates the device with the semantic subregion (e.g., a conference room in the region) by postulating it as a location disambiguation problem. LOCATER uses a bootstrapping semi-supervised learning method for coarse localization and a probabilistic method to achieve finer localization. The paper shows that LOCATER can achieve significantly high accuracy at both the coarse and fine levels.

rate research

Datasheets for Datasets

218 - Timnit Gebru , Jamie Morgenstern , Briana Vecchione 2018

The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. Datasheets for datasets will facilitate better communication between dataset creators and dataset consumers, and encourage the machine learning community to prioritize transparency and accountability.

Databases Artificial Intelligence Machine Learning

Learning Over Dirty Data Without Cleaning

176 - Jose Picado , John Davis , Arash Termehchy 2020

Real-world datasets are dirty and contain many errors. Examples of these issues are violations of integrity constraints, duplicates, and inconsistencies in representing data values and entities. Learning over dirty databases may result in inaccurate models. Users have to spend a great deal of time and effort to repair data errors and create a clean database for learning. Moreover, as the information required to repair these errors is not often available, there may be numerous possible cle

Databases Machine Learning

Toward a view-based data cleaning architecture

221 - Toshiyuki Shimizu , Hiroki Omori , Masatoshi Yoshikawa 2019

Big data analysis has become an active area of study with the growth of machine learning techniques. To properly analyze data, it is important to maintain high-quality data. Thus, research on data cleaning is also important. It is difficult to automatically detect and correct inconsistent values for data requiring expert knowledge or data created by many contributors, such as integrated data from heterogeneous data sources. An example of such data is metadata for scientific datasets, which should be confirmed by data managers while handling the data. To support the efficient cleaning of data by data managers, we propose a data cleaning architecture in which data managers interactively browse and correct portions of data through views. In this paper, we explain our view-based data cleaning architecture and discuss some remaining issues.

Databases

A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance

119 - Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov 2021

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done manually with data wrangling tools, or it can be completed automatically with a computer program. Data cleaning entails a slew of procedures that, once done, make the data ready for analysis. Given its significance in numerous fields, there is a growing interest in the development of efficient and effective data cleaning frameworks. In this survey, some of the most recent advancements of data cleaning approaches are examined for their effectiveness and the future research directions are suggested to close the gap in each of the methods.

Databases

KGClean: An Embedding Powered Knowledge Graph Cleaning Framework

243 - Congcong Ge , Yunjun Gao , Honghui Weng 2020

The quality assurance of the knowledge graph is a prerequisite for various knowledge-driven applications. We propose KGClean, a novel cleaning framework powered by knowledge graph embedding, to detect and repair the heterogeneous dirty data. In contrast to previous approaches that either focus on filling missing data or clean errors violated limited rules, KGClean enables (i) cleaning both missing data and other erroneous values, and (ii) mining potential rules automatically, which expands the coverage of error detecting. KGClean first learns data representations by TransGAT, an effective knowledge graph embedding model, which gathers the neighborhood information of each data and incorporates the interactions among data for casting data to continuous vector spaces with rich semantics. KGClean integrates an active learning-based classification model, which identifies errors with a small seed of labels. KGClean utilizes an efficient PRO-repair strategy to repair errors using a novel concept of propagation power. Extensive experiments on four typical knowledge graphs demonstrate the effectiveness of KGClean in practice.

Databases

comments

Fetching comments

Information Technology Institute ITI

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

LOCATER: Cleaning WiFi Connectivity Datasets for Semantic Localization

Ask ChatGPT about the research

No Arabic abstract

Read More