No Arabic abstract
We have developed a highly scalable application, called Shoal, for tracking and utilizing a distributed set of HTTP web caches. Squid servers advertise their existence to the Shoal server via AMQP messaging by running Shoal Agent. The Shoal server provides a simple REST interface that allows clients to determine their closest Squid cache. Our goal is to dynamically instantiate Squid caches on IaaS clouds in response to client demand. Shoal provides the VMs on IaaS clouds with the location of the nearest dynamically instantiated Squid Cache. In this paper, we describe the design and performance of Shoal.
VM startup time is an essential factor in designing elastic cloud applications. For example, a cloud application with autoscaling can reduce under- and over-provisioning of VM instances with a precise estimation of VM startup time, and in turn, it is likely to guarantee the applications performance and improve the cost efficiency. However, VM startup time has been little studied, and available measurement results performed previously did not consider various configurations of VMs for modern cloud applications. In this work, we perform comprehensive measurements and analysis of VM startup time from two major cloud providers, namely Amazon Web Services (AWS) and Google Cloud Platform (GCP). With three months of measurements, we collected more than 300,000 data points from each provider by applying a set of configurations, including 11+ VM types, four different data center locations, four VM image sizes, two OS types, and two purchase models (e.g., spot/preemptible VMs vs. on-demand VMs). With extensive analysis, we found that VM startup time can vary significantly because of several important factors, such as VM image sizes, data center locations, VM types, and OS types. Moreover, by comparing with previous measurement results, we confirm that cloud providers (specifically AWS) made significant improvements for the VM startup times and currently have much quicker VM startup times than in the past.
Caches are an important component of modern computing systems given their significant impact on performance. In particular, caches play a key role in the cloud due to the nature of large-scale, data-intensive processing. One of the key challenges for the cloud providers is how to share the caching capacity among tenants, under the circumstance that each often requires a different degree of quality of service (QoS) with respect to data access performance. The invariant is that the individual tenants QoS requirements should be satisfied while the cache usage is optimized in a system-wide manner. In this paper, we introduce a learning-based approach for dynamic cache management in a cloud, which is based on the estimation of data access pattern of a tenant and the prediction of cache performance for the access pattern in question. We consider a variety of probability distributions to estimate the data access pattern, and examine a set of learning-based regression techniques to predict the cache hit rate for the access pattern. The predicted cache hit rate is then used to make a decision whether reallocating cache space is needed to meet the QoS requirement for the tenant. Our experimental results with an extensive set of synthetic traces and the YCSB benchmark show that the proposed method consistently optimizes the cache space while satisfying the QoS requirement.
We present FLIC, a distributed software data caching framework for fogs that reduces network traffic and latency. FLICis targeted toward city-scale deployments of cooperative IoT devices in which each node gathers and shares data with surrounding devices. As machine learning and other data processing techniques that require large volumes of training data are ported to low-cost and low-power IoT systems, we expect that data analysis will be moved away from the cloud. Separation from the cloud will reduce reliance on power-hungry centralized cloud-based infrastructure. However, city-scale deployments of cooperative IoT devices often connect to the Internet with cellular service, in which service charges are proportional to network usage. IoT system architects must be clever in order to keep costs down in these scenarios. To reduce the network bandwidth required to operate city-scale deployments of cooperative IoT systems, FLIC implements a distributed cache on the IoT nodes in the fog. FLIC allows the IoT network to share its data without repetitively interacting with a simple cloud storage service reducing calls out to a backing store. Our results displayed a less than 2% miss rate on reads. Thus, allowing for only 5% of requests needing the backing store. We were also able to achieve more than 50% reduction in bytes transmitted per second.
Systems for processing big data---e.g., Hadoop, Spark, and massively parallel databases---need to run workloads on behalf of multiple tenants simultaneously. The abundant disk-based storage in these systems is usually complemented by a smaller, but much faster, {em cache}. Cache is a precious resource: Tenants who get to use cache can see two orders of magnitude performance improvement. Cache is also a limited and hence shared resource: Unlike a resource like a CPU core which can be used by only one tenant at a time, a cached data item can be accessed by multiple tenants at the same time. Cache, therefore, has to be shared by a multi-tenancy-aware policy across tenants, each having a unique set of priorities and workload characteristics. In this paper, we develop cache allocation strategies that speed up the overall workload while being {em fair} to each tenant. We build a novel fairness model targeted at the shared resource setting that incorporates not only the more standard concepts of Pareto-efficiency and sharing incentive, but also define envy freeness via the notion of {em core} from cooperative game theory. Our cache management platform, ROBUS, uses randomization over small time batches, and we develop a proportionally fair allocation mechanism that satisfies the core property in expectation. We show that this algorithm and related fair algorithms can be approximated to arbitrary precision in polynomial time. We evaluate these algorithms on a ROBUS prototype implemented on Spark with RDD store used as cache. Our evaluation on a synthetically generated industry-standard workload shows that our algorithms provide a speedup close to performance optimal algorithms while guaranteeing fairness across tenants.
To ensure uninterrupted services to the cloud clients from federated cloud providers, it is important to guarantee an efficient allocation of the cloud resources to users to improve the rate of client satisfaction and the quality of the service provisions. It is better to get as more computing and storage resources as possible. In cloud domain several Multi Agent Resource Allocation methods have been proposed to implement the problem of dynamic resource allocation. However the problem is still open and many works to do in this field. In cloud computing robustness is important so in this paper we focus on auto-adaptive method to deal with changes of open federated cloud computing environment. Our approach is hybrid, we first adopt an existing organizations optimization approach for self organization in broker agent organization to combine it with already existing Multi Agent Resource Allocation approach on Federated Clouds. We consider an open clouds federation environment which is dynamic and in constant evolution, new cloud operators can join the federation or leave this one. At the same time our approach is multi criterion which can take in account various parameters (i.e. computing load balance of mediator agent, geographical distance (network delay) between costumer and provider...).