Dependable IoT Data Stream Processing for Monitoring and Control of Urban Infrastructures

298 0 0.0 ( 0 )

Download Cite

Added by Morgan Geldenhuys

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Morgan K. Geldenhuys - Jonathan Will - Benjamin J. J. Pfister

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The Internet of Things describes a network of physical devices interacting and producing vast streams of sensor data. At present there are a number of general challenges which exist while developing solutions for use cases involving the monitoring and control of urban infrastructures. These include the need for a dependable method for extracting value from these high volume streams of time sensitive data which is adaptive to changing workloads. Low-latency access to the current state for live monitoring is a necessity as well as the ability to perform queries on historical data. At the same time, many design choices need to be made and the number of possible technology options available further adds to the complexity. In this paper we present a dependable IoT data processing platform for the monitoring and control of urban infrastructures. We define requirements in terms of dependability and then select a number of mature open-source technologies to match these requirements. We examine the disparate parts necessary for delivering a holistic overall architecture and describe the dataflows between each of these components. We likewise present generalizable methods for the enrichment and analysis of sensor data applicable across various application areas. We demonstrate the usefulness of this approach by providing an exemplary prototype platform executing on top of Kubernetes and evaluate the effectiveness of jobs processing sensor data in this environment.

rate research

Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing

121 - Morgan K. Geldenhuys , Benjamin J. J. Pfister , Dominik Scheinert 2021

Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these results is dependent on a systems ability to tolerate failure. Typically, these systems achieve fault tolerance and the ability to recover automatically from partial failures by implementing checkpoint and rollback recovery. However, owing to the statistical probability of partial failures occurring in these distributed environments and the variability of workloads upon which jobs are expected to operate, static configurations will often not meet Quality of Service constraints with low overhead. In this paper we present Khaos, a new approach which utilizes the parallel processing capabilities of virtual cloud automation technologies for the automatic runtime optimization of fault tolerance configurations in Distributed Stream Processing jobs. Our approach employs three subsequent phases which borrows from the principles of Chaos Engineering: establish the steady-state processing conditions, conduct experiments to better understand how the system performs under failure, and use this knowledge to continuously minimize Quality of Service violations. We implemented Khaos prototypically together with Apache Flink and demonstrate its usefulness experimentally.

Distributed Parallel and Cluster Computing

A Security Monitoring Framework For Virtualization Based HEP Infrastructures

64 - A. Gomez Ramirez , M. Martinez Pedreira , C. Grigoras 2017

High Energy Physics (HEP) distributed computing infrastructures require automatic tools to monitor, analyze and react to potential security incidents. These tools should collect and inspect data such as resource consumption, logs and sequence of system calls for detecting anomalies that indicate the presence of a malicious agent. They should also be able to perform automated reactions to attacks without administrator intervention. We describe a novel framework that accomplishes these requirements, with a proof of concept implementation for the ALICE experiment at CERN. We show how we achieve a fully virtualized environment that improves the security by isolating services and Jobs without a significant performance impact. We also describe a collected dataset for Machine Learning based Intrusion Prevention and Detection Systems on Grid computing. This dataset is composed of resource consumption measurements (such as CPU, RAM and network traffic), logfiles from operating system services, and system call data collected from production Jobs running in an ALICE Grid test site and a big set of malware. This malware was collected from security research sites. Based on this dataset, we will proceed to develop Machine Learning algorithms able to detect malicious Jobs.

Distributed Parallel and Cluster Computing Artificial Intelligence Cryptography and Security

Evaluation of Load Prediction Techniques for Distributed Stream Processing

177 - Kordian Gontarska , Morgan Geldenhuys , Dominik Scheinert 2021

Distributed Stream Processing (DSP) systems enable processing large streams of continuous data to produce results in near to real time. They are an essential part of many data-intensive applications and analytics platforms. The rate at which events arrive at DSP systems can vary considerably over time, which may be due to trends, cyclic, and seasonal patterns within the data streams. A priori knowledge of incoming workloads enables proactive approaches to resource management and optimization tasks such as dynamic scaling, live migration of resources, and the tuning of configuration parameters during run-times, thus leading to a potentially better Quality of Service. In this paper we conduct a comprehensive evaluation of different load prediction techniques for DSP jobs. We identify three use-cases and formulate requirements for making load predictions specific to DSP jobs. Automatically optimized classical and Deep Learning methods are being evaluated on nine different datasets from typical DSP domains, i.e. the IoT, Web 2.0, and cluster monitoring. We compare model performance with respect to overall accuracy and training duration. Our results show that the Deep Learning methods provide the most accurate load predictions for the majority of the evaluated datasets.

Distributed Parallel and Cluster Computing Artificial Intelligence

Trevor: Automatic configuration and scaling of stream processing pipelines

272 - Manu Bansal , Eyal Cidon , Arjun Balasingam 2018

Operating a distributed data stream processing workload efficiently at scale is hard. The operator of the workload must parallelize and lay out tasks of the workload with resources that match the requirement of target data rate. The challenge is that neither the operator nor the programmer is typically aware of the scaling behavior of the workload as a function of resources. An operator manually searches for a safe operating point that can handle predicted peak load and deploys with ample headroom for absorbing unpredictable spikes. Such empirical, static over-provisioning is wasteful of both compute and human resources. We show that precise performance models can be automatically learned for distributed stream processing systems that can predict the execution performance of a job even before deployment. Further, those models can be used to optimally schedule logically specified jobs onto available physical hardware. Finally, those models and the derived execution schedules can be refined online to dynamically adapt to unpredictable changes in the runtime environment or auto-scale with variations in job load.

Distributed Parallel and Cluster Computing

Blockchain for IoT Access Control: Recent Trends and Future Research Directions

92 - Shantanu Pal , Ali Dorri , Raja Jurdak 2021

With the rapid development of wireless sensor networks, smart devices, and traditional information and communication technologies, there is tremendous growth in the use of Internet of Things (IoT) applications and services in our everyday life. IoT systems deal with high volumes of data. This data can be particularly sensitive, as it may include health, financial, location, and other highly personal information. Fine-grained security management in IoT demands effective access control. Several proposals discuss access control for the IoT, however, a limited focus is given to the emerging blockchain-based solutions for IoT access control. In this paper, we review the recent trends and critical needs for blockchain-based solutions for IoT access control. We identify several important aspects of blockchain, including decentralised control, secure storage and sharing information in a trustless manner, for IoT access control including their benefits and limitations. Finally, we note some future research directions on how to converge blockchain in IoT access control efficiently and effectively.

Distributed Parallel and Cluster Computing Cryptography and Security