Streaming data preprocessing via online tensor recovery for large environmental sensor networks

69 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yue Hu

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yue Hu - Ao Qu - Yanbing Wang

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Measuring the built and natural environment at a fine-grained scale is now possible with low-cost urban environmental sensor networks. However, fine-grained city-scale data analysis is complicated by tedious data cleaning including removing outliers and imputing missing data. While many methods exist to automatically correct anomalies and impute missing entries, challenges still exist on data with large spatial-temporal scales and shifting patterns. To address these challenges, we propose an online robust tensor recovery (OLRTR) method to preprocess streaming high-dimensional urban environmental datasets. A small-sized dictionary that captures the underlying patterns of the data is computed and constantly updated with new data. OLRTR enables online recovery for large-scale sensor networks that provide continuous data streams, with a lower computational memory usage compared to offline batch counterparts. In addition, we formulate the objective function so that OLRTR can detect structured outliers, such as faulty readings over a long period of time. We validate OLRTR on a synthetically degraded National Oceanic and Atmospheric Administration temperature dataset, with a recovery error of 0.05, and apply it to the Array of Things city-scale sensor network in Chicago, IL, showing superior results compared with several established online and batch-based low rank decomposition methods.

قيم البحث

252 - Qianxin Yi , Chenhao Wang , Kaidong Wang 2021

Low-tubal-rank tensor approximation has been proposed to analyze large-scale and multi-dimensional data. However, finding such an accurate approximation is challenging in the streaming setting, due to the limited computational resources. To alleviate this issue, this paper extends a popular matrix sketching technique, namely Frequent Directions, for constructing an efficient and accurate low-tubal-rank tensor approximation from streaming data based on the tensor Singular Value Decomposition (t-SVD). Specifically, the new algorithm allows the tensor data to be observed slice by slice, but only needs to maintain and incrementally update a much smaller sketch which could capture the principal information of the original tensor. The rigorous theoretical analysis shows that the approximation error of the new algorithm can be arbitrarily small when the sketch size grows linearly. Extensive experimental results on both synthetic and real multi-dimensional data further reveal the superiority of the proposed algorithm compared with other sketching algorithms for getting low-tubal-rank approximation, in terms of both efficiency and accuracy.

التعلم الآلي التعلم الالي

Fast and Quality-Guaranteed Data Streaming in Resource-Constrained Sensor Networks

410 - Emad Soroush , Kui Wu , Jian Pei 2008

In many emerging applications, data streams are monitored in a network environment. Due to limited communication bandwidth and other resource constraints, a critical and practical demand is to online compress data streams continuously with quality gu arantee. Although many data compression and digital signal processing methods have been developed to reduce data volume, their super-linear time and more-than-constant space complexity prevents them from being applied directly on data streams, particularly over resource-constrained sensor networks. In this paper, we tackle the problem of online quality guaranteed compression of data streams using fast linear approximation (i.e., using line segments to approximate a time series). Technically, we address tw

بنى وهياكل البيانات والخوارزميات الوسائط المتعددة

Deciphering Environmental Air Pollution with Large Scale City Data

121 - Mayukh Bhattacharyya , Sayan Nag , Udita Ghosh 2021

Out of the numerous hazards posing a threat to sustainable environmental conditions in the 21st century, only a few have a graver impact than air pollution. Its importance in determining the health and living standards in urban settings is only expec ted to increase with time. Various factors ranging from emissions from traffic and power plants, household emissions, natural causes are known to be primary causal agents or influencers behind rising air pollution levels. However, the lack of large scale data involving the major factors has hindered the research on the causes and relations governing the variability of the different air pollutants. Through this work, we introduce a large scale city-wise dataset for exploring the relationships among these agents over a long period of time. We analyze and explore the dataset to bring out inferences which we can derive by modeling the data. Also, we provide a set of benchmarks for the problem of estimating or forecasting pollutant levels with a set of diverse models and methodologies. Through our paper, we seek to provide a ground base for further research into this domain that will demand critical attention of ours in the near future.

التعلم الآلي الذكاء الاصطناعي تحليل البيانات والإحصاءات والاحتمال

Detecting Overfitting of Deep Generative Networks via Latent Recovery

78 - Ryan Webster , Julien Rabin , Loic Simon 2019

State of the art deep generative networks are capable of producing images with such incredible realism that they can be suspected of memorizing training images. It is why it is not uncommon to include visualizations of training set nearest neighbors, to suggest generated images are not simply memorized. We demonstrate this is not sufficient and motivates the need to study memorization/overfitting of deep generators with more scrutiny. This paper addresses this question by i) showing how simple losses are highly effective at reconstructing images for deep generators ii) analyzing the statistics of reconstruction errors when reconstructing training and validation images, which is the standard way to analyze overfitting in machine learning. Using this methodology, this paper shows that overfitting is not detectable in the pure GAN models proposed in the literature, in contrast with those using hybrid adversarial losses, which are amongst the most widely applied generative methods. The paper also shows that standard GAN evaluation metrics fail to capture memorization for some deep generators. Finally, the paper also shows how off-the-shelf GAN generators can be successfully applied to face inpainting and face super-resolution using the proposed reconstruction method, without hybrid adversarial losses.

التعلم الآلي التعلم الالي

Artificial Neural Networks for Sensor Data Classification on Small Embedded Systems

110 - Marcus Venzke , Daniel Klisch , Philipp Kubik 2020

In this paper we investigate the usage of machine learning for interpreting measured sensor values in sensor modules. In particular we analyze the potential of artificial neural networks (ANNs) on low-cost micro-controllers with a few kilobytes of me mory to semantically enrich data captured by sensors. The focus is on classifying temporal data series with a high level of reliability. Design and implementation of ANNs are analyzed considering Feed Forward Neural Networks (FFNNs) and Recurrent Neural Networks (RNNs). We validate the developed ANNs in a case study of optical hand gesture recognition on an 8-bit micro-controller. The best reliability was found for an FFNN with two layers and 1493 parameters requiring an execution time of 36 ms. We propose a workflow to develop ANNs for embedded devices.

التعلم الآلي الحوسبة العصبية والتطورية