Performance Analysis and Optimization of a Hybrid Distributed Reverse Time Migration Application

59 0 0.0 ( 0 )

Download Cite

Added by Sri Raj Paul

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Sri Raj Paul - John Mellor-Crummey - Mauricio Araya-Polo

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Applications to process seismic data employ scalable parallel systems to produce timely results. To fully exploit emerging processor architectures, application will need to employ threaded parallelism within a node and message passing across nodes. Today, MPI+OpenMP is the preferred programming model for this task. However, tuning hybrid programs for clusters is difficult. Performance tools can help users identify bottlenecks and uncover opportunities for improvement. This poster describes our experiences of applying Rice Universitys HPCToolkit and hardware performance counters to gain insight into an MPI+OpenMP code that performs Reverse Time Migration (RTM) on a cluster of multicore processors. The tools provided us with insights into the effectiveness of the domain decomposition strategy, the use of threaded parallelism, and functional unit utilization in individual cores. By applying insights obtained from the tools, we were able to improve the performance of the RTM code by roughly 30 percent.

rate research

Distributed-Memory Load Balancing with Cyclic Token-based Work-Stealing Applied to Reverse Time Migration

97 - Italo A. S. Assis , Ant^onio D. S. Oliveira , Tiago Barros 2019

Reverse time migration (RTM) is a prominent technique in seismic imaging. Its resulting subsurface images are used in the industry to investigate with higher confidence the existence and the conditions of oil and gas reservoirs. Because of its high computational cost, RTM must make use of parallel computers. Balancing the workload distribution of an RTM is a growing challenge in distributed computing systems. The competition for shared resources and the differently-sized tasks of the RTM are some of the possible sources of load imbalance. Although many load balancing techniques exist, scaling up for large problems and large systems remains a challenge because synchronization overhead also scales. This paper proposes a cyclic token-based work-stealing (CTWS) algorithm for distributed memory systems applied to RTM. The novel cyclic token approach reduces the number of failed steals, avoids communication overhead, and simplifies the victim selection and the termination strategy. The proposed method is implemented as a C library using the one-sided communication feature of the message passing interface (MPI) standard. Results obtained by applying the proposed technique to balance the workload of a 3D RTM system present a factor of 14.1% speedup and reductions of the load imbalance of 78.4% when compared to the conventional static distribution.

Distributed Parallel and Cluster Computing

A Distributed Application Placement and Migration Management Techniques for Edge and Fog Computing Environments

85 - Mohammad Goudarzi , Marimuthu Palaniswami , Rajkumar Buyya 2021

Fog/Edge computing model allows harnessing of resources in the proximity of the Internet of Things (IoT) devices to support various types of real-time IoT applications. However, due to the mobility of users and a wide range of IoT applications with different requirements, it is a challenging issue to satisfy these applications requirements. The execution of IoT applications exclusively on one fog/edge server may not be always feasible due to limited resources, while execution of IoT applications on different servers needs further collaboration among servers. Also, considering user mobility, some modules of each IoT application may require migration to other servers for execution, leading to service interruption and extra execution costs. In this article, we propose a new weighted cost model for hierarchical fog computing environments, in terms of the response time of IoT applications and energy consumption of IoT devices, to minimize the cost of running IoT applications and potential migrations. Besides, a distributed clustering technique is proposed to enable the collaborative execution of tasks, emitted from application modules, among servers. Also, we propose an application placement technique to minimize the overall cost of executing IoT applications on multiple servers in a distributed manner. Furthermore, a distributed migration management technique is proposed for the potential migration of applications modules to other remote servers as the users move along their path. Besides, failure recovery methods are embedded in the clustering, application placement, and migration management techniques to recover from unpredicted failures. The performance results show that our technique significantly improves its counterparts in terms of placement deployment time, average execution cost of tasks, total number of migrations, total number of interrupted tasks, and cumulative migration cost.

Distributed Parallel and Cluster Computing Performance

An Automated Implementation of Hybrid Cloud for Performance Evaluation of Distributed Databases

188 - Yaser Mansouri , Victor Prokhorenko , 2020

A Hybrid cloud is an integration of resources between private and public clouds. It enables users to horizontally scale their on-premises infrastructure up to public clouds in order to improve performance and cut up-front investment cost. This model of applications deployment is called cloud bursting that allows data-intensive applications especially distributed database systems to have the benefit of both private and public clouds. In this work, we present an automated implementation of a hybrid cloud using (i) a robust and zero-cost Linux-based VPN to make a secure connection between private and public clouds, and (ii) Terraform as a software tool to deploy infrastructure resources based on the requirements of hybrid cloud. We also explore performance evaluation of cloud bursting for six modern and distributed database systems on the hybrid cloud spanning over local OpenStack and Microsoft Azure. Our results reveal that MongoDB and MySQL Cluster work efficient in terms of throughput and operations latency if they burst into a public cloud to supply their resources. In contrast, the performance of Cassandra, Riak, Redis, and Couchdb reduces if they significantly leverage their required resources via cloud bursting.

Distributed Parallel and Cluster Computing

The Impact of Distance on Performance and Scalability of Distributed Database Systems in Hybrid Clouds

167 - Yaser Mansouri , M. Ali Babar 2020

The increasing need for managing big data has led the emergence of advanced database management systems. There has been increased efforts aimed at evaluating the performance and scalability of NoSQL and Relational databases hosted by either private or public cloud datacenters. However, there has been little work on evaluating the performance and scalability of these databases in hybrid clouds, where the distance between private and public cloud datacenters can be one of the key factors that can affect their performance. Hence, in this paper, we present a detailed evaluation of throughput, scalability, and VMs size vs. VMs number for six modern databases in a hybrid cloud, consisting of a private cloud in Adelaide and Azure based datacenter in Sydney, Mumbai, and Virginia regions. Based on results, as the distance between private and public clouds increases, the throughput performance of most databases reduces. Second, MongoDB obtains the best throughput performance, followed by MySQL C luster, whilst Cassandra exposes the most fluctuation in through performance. Third, vertical scalability improves the throughput of databases more than the horizontal scalability. Forth, exploiting bigger VMs rather than more VMs with less cores can increase throughput performance for Cassandra, Riak, and Redis.

Distributed Parallel and Cluster Computing

On the performance overhead tradeoff of distributed principal component analysis via data partitioning

158 - Ni An , Steven Weber 2015

Principal component analysis (PCA) is not only a fundamental dimension reduction method, but is also a widely used network anomaly detection technique. Traditionally, PCA is performed in a centralized manner, which has poor scalability for large distributed systems, on account of the large network bandwidth cost required to gather the distributed state at a fusion center. Consequently, several recent works have proposed various distributed PCA algorithms aiming to reduce the communication overhead incurred by PCA without losing its inferential power. This paper evaluates the tradeoff between communication cost and solution quality of two distributed PCA algorithms on a real domain name system (DNS) query dataset from a large network. We also apply the distributed PCA algorithm in the area of network anomaly detection and demonstrate that the detection accuracy of both distributed PCA-based methods has little degradation in quality, yet achieves significant savings in communication bandwidth.

Distributed Parallel and Cluster Computing Networking and Internet Architecture Performance