MURS: Mitigating Memory Pressure in Service-oriented Data Processing Systems

68 0 0.0 ( 0 )

Download Cite

Added by Xiong Zhang

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Xuanhua Shi - Xiong Zhang - Ligang He

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Although a data processing system often works as a batch processing system, many enterprises deploy such a system as a service, which we call the service-oriented data processing system. It has been shown that in-memory data processing systems suffer from serious memory pressure. The situation becomes even worse for the service-oriented data processing systems due to various reasons. For example, in a service-oriented system, multiple submitted tasks are launched at the same time and executed in the same context in the resources, comparing with the batch processing mode where the tasks are processed one by one. Therefore, the memory pressure will affect all submitted tasks, including the tasks that only incur the light memory pressure when they are run alone. In this paper, we find that the reason why memory pressure arises is because the running tasks produce massive long-living data objects in the limited memory space. Our studies further reveal that the long-living data objects are generated by the API functions that are invoked by the in-memory processing frameworks. Based on these findings, we propose a method to classify the API functions based on the memory usage rate. Further, we design a scheduler called MURS to mitigate the memory pressure. We implement MURS in Spark and conduct the experiments to evaluate the performance of MURS. The results show that when comparing to Spark, MURS can 1) decrease the execution time of the submitted jobs by up to 65.8%, 2) mitigate the memory pressure in the server by decreasing the garbage collection time by up to 81%, and 3) reduce the data spilling, and hence disk I/O, by approximately 90%.

rate research

Platform Autonomous Custom Scalable Service using Service Oriented Cloud Computing Architecture

105 - B. Kamala , B. Priya , J. M. Nandhini 2016

The global economic recession and the shrinking budget of IT projects have led to the need of development of integrated information systems at a lower cost. Today, the emerging phenomenon of cloud computing aims at transforming the traditional way of computing by providing both software applications and hardware resources as a service. With the rapid evolution of Information Communication Technology (ICT) governments, organizations and businesses are looking for solutions to improve their services and integrate their IT infrastructures. In recent years advanced technologies such as SOA and Cloud computing have been evolved to address integration problems. The Clouds enormous capacity with comparable low cost makes it an ideal platform for SOA deployment. This paper deals with the combined approach of Cloud and Service Oriented Architecture along with a Case Study and a review.

Distributed Parallel and Cluster Computing

Mitigating the Performance-Efficiency Tradeoff in Resilient Memory Disaggregation

105 - Youngmoon Lee , Hasan Al Maruf , Mosharaf Chowdhury 2019

We present the design and implementation of a low-latency, low-overhead, and highly available resilient disaggregated cluster memory. Our proposed framework can access erasure-coded remote memory within a single-digit {mu}s read/write latency, significantly improving the performance-efficiency tradeoff over the state-of-the-art - it performs similar to in-memory replication with 1.6x lower memory overhead. We also propose a novel coding group placement algorithm for erasure-coded data, that provides load balancing while reducing the probability of data loss under correlated failures by an order of magnitude.

Distributed Parallel and Cluster Computing Networking and Internet Architecture

SoA-Fog: Secure Service-Oriented Edge Computing Architecture for Smart Health Big Data Analytics

73 - Rabindra K. Barik , Harishchandra Dubey , Kunal Mankodiya 2017

The smart health paradigms employ Internet-connected wearables for telemonitoring, diagnosis for providing inexpensive healthcare solutions. Fog computing reduces latency and increases throughput by processing data near the body sensor network. In this paper, we proposed a secure serviceorientated edge computing architecture that is validated on recently released public dataset. Results and discussions support the applicability of proposed architecture for smart health applications. We proposed SoA-Fog i.e. a three-tier secure framework for efficient management of health data using fog devices. It discuss the security aspects in client layer, fog layer and the cloud layer. We design the prototype by using win-win spiral model with use case and sequence diagram. Overlay analysis was performed using proposed framework on malaria vector borne disease positive maps of Maharastra state in India from 2011 to 2014. The mobile clients were taken as test case. We performed comparative analysis between proposed secure fog framework and state-of-the art cloud-based framework.

Distributed Parallel and Cluster Computing

Memory at Your Service: Fast Memory Allocation for Latency-critical Services

104 - Aidi Pi , Junxian Zhao , Shaoqi Wang 2021

Co-location and memory sharing between latency-critical services, such as key-value store and web search, and best-effort batch jobs is an appealing approach to improving memory utilization in multi-tenant datacenter systems. However, we find that the very diverse goals of job co-location and the GNU/Linux system stack can lead to severe performance degradation of latency-critical services under memory pressure in a multi-tenant system. We address memory pressure for latency-critical services via fast memory allocation and proactive reclamation. We find that memory allocation latency dominates the overall query latency, especially under memory pressure. We analyze the default memory management mechanism provided by GNU/Linux system stack and identify the reasons why it is inefficient for latency-critical services in a multi-tenant system. We present Hermes, a fast memory allocation mechanism in user space that adaptively reserves memory for latency-critical services. It advises Linux OS to proactively reclaim memory of batch jobs. We implement Hermes in GNU C Library. Experimental result shows that Hermes reduces the average and the $99^{th}$ percentile memory allocation latency by up to 54.4% and 62.4% for a micro benchmark, respectively. For two real-world latency-critical services, Hermes reduces both the average and the $99^{th}$ percentile tail query latency by up to 40.3%. Compared to the default Glibc, jemalloc and TCMalloc, Hermes reduces Service Level Objective violation by up to 84.3% under memory pressure.

Distributed Parallel and Cluster Computing

MetaFlow: a Scalable Metadata Lookup Service for Distributed File Systems in Data Centers

120 - Peng Sun , Yonggang Wen , Ta Nguyen Binh Duong 2016

In large-scale distributed file systems, efficient meta- data operations are critical since most file operations have to interact with metadata servers first. In existing distributed hash table (DHT) based metadata management systems, the lookup service could be a performance bottleneck due to its significant CPU overhead. Our investigations showed that the lookup service could reduce system throughput by up to 70%, and increase system latency by a factor of up to 8 compared to ideal scenarios. In this paper, we present MetaFlow, a scalable metadata lookup service utilizing software-defined networking (SDN) techniques to distribute lookup workload over network components. MetaFlow tackles the lookup bottleneck problem by leveraging B-tree, which is constructed over the physical topology, to manage flow tables for SDN-enabled switches. Therefore, metadata requests can be forwarded to appropriate servers using only switches. Extensive performance evaluations in both simulations and testbed showed that MetaFlow increases system throughput by a factor of up to 3.2, and reduce system latency by a factor of up to 5 compared to DHT-based systems. We also deployed MetaFlow in a distributed file system, and demonstrated significant performance improvement.

Distributed Parallel and Cluster Computing