Smart City Data Analysis via Visualization of Correlated Attribute Patterns

148 0 0.0 ( 0 )

Download Cite

Added by Yuya Sasaki

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yuya Sasaki - Keizo Hori - Daiki Nishihara

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Urban conditions are monitored by a wide variety of sensors that measure several attributes, such as temperature and traffic volume. The correlations of sensors help to analyze and understand the urban conditions accurately. The correlated attribute pattern (CAP) mining discovers correlations among multiple attributes from the sets of sensors spatially close to each other and temporally correlated in their measurements. In this paper, we develop a visualization system for CAP mining and demonstrate analysis of smart city data. Our visualization system supports an intuitive understanding of mining results via sensor locations on maps and temporal changes of their measurements. In our demonstration scenarios, we provide four smart city datasets collected from China and Santander, Spain. We demonstrate that our system helps interactive analysis of smart city data.

rate research

A Survey of Data Fusion in Smart City Applications

280 - Billy Pik Lik Lau , Sumudu Hasala Marakkalage , Yuren Zhou 2019

The advancement of various research sectors such as Internet of Things (IoT), Machine Learning, Data Mining, Big Data, and Communication Technology has shed some light in transforming an urban city integrating the aforementioned techniques to a commonly known term - Smart City. With the emergence of smart city, plethora of data sources have been made available for wide variety of applications. The common technique for handling multiple data sources is data fusion, where it improves data output quality or extracts knowledge from the raw data. In order to cater evergrowing highly complicated applications, studies in smart city have to utilize data from various sources and evaluate their performance based on multiple aspects. To this end, we introduce a multi-perspectives classification of the data fusion to evaluate the smart city applications. Moreover, we applied the proposed multi-perspectives classification to evaluate selected applications in each domain of the smart city. We conclude the paper by discussing potential future direction and challenges of data fusion integration.

Signal Processing

Enabling Smart Data: Noise filtering in Big Data classification

419 - Diego Garcia-Gil , Julian Luengo , Salvador Garcia 2017

In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same dictate. A common problem affecting data quality is the presence of noise, particularly in classification problems, where label noise refers to the incorrect labeling of training instances, and is known to be a very disruptive feature of data. However, in this Big Data era, the massive growth in the scale of the data poses a challenge to traditional proposals created to tackle noise, as they have difficulties coping with such a large amount of data. New algorithms need to be proposed to treat the noise in Big Data problems, providing high quality and clean data, also known as Smart Data. In this paper, two Big Data preprocessing approaches to remove noisy examples are proposed: an homogeneous ensemble and an heterogeneous ensemble filter, with special emphasis in their scalability and performance traits. The obtained results show that these proposals enable the practitioner to efficiently obtain a Smart Dataset from any Big Data classification problem.

Databases Machine Learning

Pruning Attribute Values From Data Cubes with Diamond Dicing

137 - Hazel Webb , Owen Kaser , Daniel Lemire 2008

Data stored in a data warehouse are inherently multidimensional, but most data-pruning techniques (such as iceberg and top-k queries) are unidimensional. However, analysts need to issue multidimensional queries. For example, an analyst may need to select not just the most profitable stores or--separately--the most profitable products, but simultaneous sets of stores and products fulfilling some profitability constraints. To fill this need, we propose a new operator, the diamond dice. Because of the interaction between dimensions, the computation of diamonds is challenging. We present the first diamond-dicing experiments on large data sets. Experiments show that we can compute diamond cubes over fact tables containing 100 million facts in less than 35 minutes using a standard PC.

Databases Data Structures and Algorithms

Lux: Always-on Visualization Recommendations for Exploratory Data Science

180 - Doris Jung-Lin Lee , Dixin Tang , Kunal Agarwal 2021

Exploratory data science largely happens in computational notebooks with dataframe API, such as pandas, that support flexible means to transform, clean, and analyze data. Yet, visually exploring data in dataframes remains tedious, requiring substantial programming effort for visualization and mental effort to determine what analysis to perform next. We propose Lux, an always-on framework for accelerating visual insight discovery in data science workflows. When users print a dataframe in their notebooks, Lux recommends visualizations to provide a quick overview of the patterns and trends and suggests promising analysis directions. Lux features a high-level language for generating visualizations on-demand to encourage rapid visual experimentation with data. We demonstrate that through the use of a careful design and three system optimizations, Lux adds no more than two seconds of overhead on top of pandas for over 98% of datasets in the UCI repository. We evaluate Lux in terms of usability via a controlled first-use study and interviews with early adopters, finding that Lux helps fulfill the needs of data scientists for visualization support within their dataframe workflows. Lux has already been embraced by data science practitioners, with over 1.9k stars on Github within its first 15 months.

Databases Human-Computer Interaction

Regression-based Online Anomaly Detection for Smart Grid Data

131 - Xiufeng Liu , Per Sieverts Nielsen 2016

With the widely used smart meters in the energy sector, anomaly detection becomes a crucial mean to study the unusual consumption behaviors of customers, and to discover unexpected events of using energy promptly. Detecting consumption anomalies is, essentially, a real-time big data analytics problem, which does data mining on a large amount of parallel data streams from smart meters. In this paper, we propose a supervised learning and statistical-based anomaly detection method, and implement a Lambda system using the in-memory distributed computing framework, Spark and its extension Spark Streaming. The system supports not only iterative detection model refreshment from scalable data sets, but also real-time detection on scalable live data streams. This paper empirically evaluates the system and the detection algorithm, and the results show the effectiveness and the scalability of the proposed lambda detection system.

Databases