Cloud Resource Optimization for Processing Multiple Streams of Visual Data

140 0 0.0 ( 0 )

Download Cite

Added by Zohar Kapach

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Zohar Kapach - Andrew Ulmer - Daniel Merrick

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Hundreds of millions of network cameras have been installed throughout the world. Each is capable of providing a vast amount of real-time data. Analyzing the massive data generated by these cameras requires significant computational resources and the demands may vary over time. Cloud computing shows the most promise to provide the needed resources on demand. In this article, we investigate how to allocate cloud resources when analyzing real-time data streams from network cameras. A resource manager considers many factors that affect its decisions, including the types of analysis, the number of data streams, and the locations of the cameras. The manager then selects the most cost-efficient types of cloud instances (e.g. CPU vs. GPGPU) to meet the computational demands for analyzing streams. We evaluate the effectiveness of our approach using Amazon Web Services. Experiments demonstrate more than 50% cost reduction for real workloads.

rate research

LHCb trigger streams optimization

112 - D. Derkach , N. Kazeev , R. Neychev 2017

The LHCb experiment stores around $10^{11}$ collision events per year. A typical physics analysis deals with a final sample of up to $10^7$ events. Event preselection algorithms (lines) are used for data reduction. Since the data are stored in a format that requires sequential access, the lines are grouped into several output file streams, in order to increase the efficiency of user analysis jobs that read these data. The scheme efficiency heavily depends on the stream composition. By putting similar lines together and balancing the stream sizes it is possible to reduce the overhead. We present a method for finding an optimal stream composition. The method is applied to a part of the LHCb data (Turbo stream) on the stage where it is prepared for user physics analysis. This results in an expected improvement of 15% in the speed of user analysis jobs, and will be applied on data to be recorded in 2017.

Distributed Parallel and Cluster Computing High Energy Physics - Experiment

Probabilistic Skyline Query Processing over Uncertain Data Streams in Edge Computing Environments

88 - Chuan-Chi Lai , Chuan-Ming Liu , Yan-Lin Chen 2020

With the advancement of technology, the data generated in our lives is getting faster and faster, and the amount of data that various applications need to process becomes extremely huge. Therefore, we need to put more effort into analyzing data and extracting valuable information. Cloud computing used to be a good technology to solve a large number of data analysis problems. However, in the era of the popularity of the Internet of Things (IoT), transmitting sensing data back to the cloud for centralized data analysis will consume a lot of wireless communication and network transmission costs. To solve the above problems, edge computing has become a promising solution. In this paper, we propose a new algorithm for processing probabilistic skyline queries over uncertain data streams in an edge computing environment. We use the concept of a second skyline set to filter data that is unlikely to be the result of the skyline. Besides, the edge server only sends the information needed to update the global analysis results on the cloud server, which will greatly reduce the amount of data transmitted over the network. The results show that our proposed method not only reduces the response time by more than 50% compared with the brute force method on two-dimensional data but also maintains the leading processing speed on high-dimensional data.

Distributed Parallel and Cluster Computing Databases Data Structures and Algorithms

Reproducible experiments on dynamic resource allocation in cloud data centers

102 - Andreas Wolke , Martin Bichler , Fernando Chirigati 2017

In Wolke et al. [1] we compare the efficiency of different resource allocation strategies experimentally. We focused on dynamic environments where virtual machines need to be allocated and deallocated to servers over time. In this companion paper, we describe the simulation framework and how to run simulations to replicate experiments or run new experiments within the framework.

Distributed Parallel and Cluster Computing

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

106 - Jonathan Will , Lauritz Thamsen , Dominik Scheinert 2021

Distributed dataflow systems enable data-parallel processing of large datasets on clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. Yet, selecting appropriate cloud resources for dataflow jobs - that neither lead to bottlenecks nor to low resource utilization - is often challenging, even for expert users such as data engineers. We present C3O, a collaborative system for optimizing data processing cluster configurations in public clouds based on shared historical runtime data. The shared data is utilized for predicting the runtimes of data processing jobs on different possible cluster configurations, using specialized regression models. These models take the diverse execution contexts of different users into account and exhibit mean absolute errors below 3% in our experimental evaluation with 930 unique Spark jobs.

Distributed Parallel and Cluster Computing

Cloud Scheduler: a resource manager for distributed compute clouds

455 - P. Armstrong , A. Agarwal , A. Bishop 2010

The availability of Infrastructure-as-a-Service (IaaS) computing clouds gives researchers access to a large set of new resources for running complex scientific applications. However, exploiting cloud resources for large numbers of jobs requires significant effort and expertise. In order to make it simple and transparent for researchers to deploy their applications, we have developed a virtual machine resource manager (Cloud Scheduler) for distributed compute clouds. Cloud Scheduler boots and manages the user-customized virtual machines in response to a users job submission. We describe the motivation and design of the Cloud Scheduler and present results on its use on both science and commercial clouds.

Distributed Parallel and Cluster Computing