The Impact of Distance on Performance and Scalability of Distributed Database Systems in Hybrid Clouds

168 0 0.0 ( 0 )

Download Cite

Added by Yaser Mansouri

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Yaser Mansouri - M. Ali Babar

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The increasing need for managing big data has led the emergence of advanced database management systems. There has been increased efforts aimed at evaluating the performance and scalability of NoSQL and Relational databases hosted by either private or public cloud datacenters. However, there has been little work on evaluating the performance and scalability of these databases in hybrid clouds, where the distance between private and public cloud datacenters can be one of the key factors that can affect their performance. Hence, in this paper, we present a detailed evaluation of throughput, scalability, and VMs size vs. VMs number for six modern databases in a hybrid cloud, consisting of a private cloud in Adelaide and Azure based datacenter in Sydney, Mumbai, and Virginia regions. Based on results, as the distance between private and public clouds increases, the throughput performance of most databases reduces. Second, MongoDB obtains the best throughput performance, followed by MySQL C luster, whilst Cassandra exposes the most fluctuation in through performance. Third, vertical scalability improves the throughput of databases more than the horizontal scalability. Forth, exploiting bigger VMs rather than more VMs with less cores can increase throughput performance for Cassandra, Riak, and Redis.

rate research

A General Framework for Scalability and Performance Analysis of DHT Routing Systems

344 - Joseph S. Kong , Jesse S. A. Bridgewater , Vwani P. Roychowdhury 2006

In recent years, many DHT-based P2P systems have been proposed, analyzed, and certain deployments have reached a global scale with nearly one million nodes. One is thus faced with the question of which particular DHT system to choose, and whether some are inherently more robust and scalable. Toward developing such a comparative framework, we present the reachable component method (RCM) for analyzing the performance of different DHT routing systems subject to random failures. We apply RCM to five DHT systems and obtain analytical expressions that characterize their routability as a continuous function of system size and node failure probability. An important consequence is that in the large-network limit, the routability of certain DHT systems go to zero for any non-zero probability of node failure. These DHT routing algorithms are therefore unscalable, while some others, including Kademlia, which powers the popular eDonkey P2P system, are found to be scalable.

Distributed Parallel and Cluster Computing

Stability and Scalability of Blockchain Systems

74 - Aditya Gopalan , Abishek Sankararaman , Anwar Walid 2020

The blockchain paradigm provides a mechanism for content dissemination and distributed consensus on Peer-to-Peer (P2P) networks. While this paradigm has been widely adopted in industry, it has not been carefully analyzed in terms of its network scaling with respect to the number of peers. Applications for blockchain systems, such as cryptocurrencies and IoT, require this form of network scaling. In this paper, we propose a new stochastic network model for a blockchain system. We identify a structural property called emph{one-endedness}, which we show to be desirable in any blockchain system as it is directly related to distributed consensus among the peers. We show that the stochastic stability of the network is sufficient for the one-endedness of a blockchain. We further establish that our model belongs to a class of network models, called monotone separable models. This allows us to establish upper and lower bounds on the stability region. The bounds on stability depend on the connectivity of the P2P network through its conductance and allow us to analyze the scalability of blockchain systems on large P2P networks. We verify our theoretical insights using both synthetic data and real data from the Bitcoin network.

Distributed Parallel and Cluster Computing Information Theory Social and Information Networks

Performance Analysis and Optimization of a Hybrid Distributed Reverse Time Migration Application

58 - Sri Raj Paul , John Mellor-Crummey , Mauricio Araya-Polo 2016

Applications to process seismic data employ scalable parallel systems to produce timely results. To fully exploit emerging processor architectures, application will need to employ threaded parallelism within a node and message passing across nodes. Today, MPI+OpenMP is the preferred programming model for this task. However, tuning hybrid programs for clusters is difficult. Performance tools can help users identify bottlenecks and uncover opportunities for improvement. This poster describes our experiences of applying Rice Universitys HPCToolkit and hardware performance counters to gain insight into an MPI+OpenMP code that performs Reverse Time Migration (RTM) on a cluster of multicore processors. The tools provided us with insights into the effectiveness of the domain decomposition strategy, the use of threaded parallelism, and functional unit utilization in individual cores. By applying insights obtained from the tools, we were able to improve the performance of the RTM code by roughly 30 percent.

Distributed Parallel and Cluster Computing

An Automated Implementation of Hybrid Cloud for Performance Evaluation of Distributed Databases

188 - Yaser Mansouri , Victor Prokhorenko , 2020

A Hybrid cloud is an integration of resources between private and public clouds. It enables users to horizontally scale their on-premises infrastructure up to public clouds in order to improve performance and cut up-front investment cost. This model of applications deployment is called cloud bursting that allows data-intensive applications especially distributed database systems to have the benefit of both private and public clouds. In this work, we present an automated implementation of a hybrid cloud using (i) a robust and zero-cost Linux-based VPN to make a secure connection between private and public clouds, and (ii) Terraform as a software tool to deploy infrastructure resources based on the requirements of hybrid cloud. We also explore performance evaluation of cloud bursting for six modern and distributed database systems on the hybrid cloud spanning over local OpenStack and Microsoft Azure. Our results reveal that MongoDB and MySQL Cluster work efficient in terms of throughput and operations latency if they burst into a public cloud to supply their resources. In contrast, the performance of Cassandra, Riak, Redis, and Couchdb reduces if they significantly leverage their required resources via cloud bursting.

Distributed Parallel and Cluster Computing

LFRic: Meeting the challenges of scalability and performance portability in Weather and Climate models

83 - S.V. Adams , R.W. Ford , M. Hambley 2018

This paper describes LFRic: the new weather and climate modelling system being developed by the UK Met Office to replace the existing Unified Model in preparation for exascale computing in the 2020s. LFRic uses the GungHo dynamical core and runs on a semi-structured cubed-sphere mesh. The design of the supporting infrastructure follows object orientated principles to facilitate modularity and the use of external libraries where possible. In particular, a `separation of concerns between the science code and parallel code is imposed to promote performance portability. An application called PSyclone, developed at the STFC Hartree centre, can generate the parallel code enabling deployment of a single source science code onto different machine architectures. This paper provides an overview of the scientific requirement, the design of the software infrastructure, and examples of PSyclone usage. Preliminary performance results show strong scaling and an indication that hybrid MPI/OpenMP performs better than pure MPI.

Distributed Parallel and Cluster Computing