Remove-Win: a Design Framework for Conflict-free Replicated Data Collections

113 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yu Huang

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yuqi Zhang - Yu Huang - Hengfeng Wei

النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Internet-scale distributed systems often replicate data within and across data centers to provide low latency and high availability despite node and network failures. Replicas are required to accept updates without coordination with each other, and the updates are then propagated asynchronously. This brings the issue of conflict resolution among concurrent updates, which is often challenging and error-prone. The Conflict-free Replicated Data Type (CRDT) framework provides a principled approach to address this challenge. This work focuses on a special type of CRDT, namely the Conflict-free Replicated Data Collection (CRDC), e.g. list and queue. The CRDC can have complex and compound data items, which are organized in structures of rich semantics. Complex CRDCs can greatly ease the development of upper-layer applications, but also makes the conflict resolution notoriously difficult. This explains why existing CRDC designs are tricky, and hard to be generalized to other data types. A design framework is in great need to guide the systematic design of new CRDCs. To address the challenges above, we propose the Remove-Win Design Framework. The remove-win strategy for conflict resolution is simple but powerful. The remove operation just wipes out the data item, no matter how complex the value is. The user of the CRDC only needs to specify conflict resolution for non-remove operations. This resolution is destructed to three basic cases and are left as open terms in the CRDC design skeleton. Stubs containing user-specified conflict resolution logics are plugged into the skeleton to obtain concrete CRDC designs. We demonstrate the effectiveness of our design framework via a case study of designing a conflict-free replicated priority queue. Performance measurements also show the efficiency of the design derived from our design framework.

قيم البحث

366 - Pierre Fraigniaud , Franc{c}ois Le Gall , Harumichi Nishimura 2020

The paper tackles the issue of $textit{checking}$ that all copies of a large data set replicated at several nodes of a network are identical. The fact that the replicas may be located at distant nodes prevents the system from verifying their equality locally, i.e., by having each node consult only nodes in its vicinity. On the other hand, it remains possible to assign $textit{certificates}$ to the nodes, so that verifying the consistency of the replicas can be achieved locally. However, we show that, as the data set is large, classical certification mechanisms, including distributed Merlin-Arthur protocols, cannot guarantee good completeness and soundness simultaneously, unless they use very large certificates. The main result of this paper is a distributed $textit{quantum}$ Merlin-Arthur protocol enabling the nodes to collectively check the consistency of the replicas, based on small certificates, and in a single round of message exchange between neighbors, with short messages. In particular, the certificate-size is logarithmic in the size of the data set, which gives an exponential advantage over classical certification mechanisms.

النظم الموزعة والتوازية والحوسبة العنقودية فيزياء الكم

Data Freshness in Leader-Based Replicated Storage

122 - Amir Behrouzi-Far , Emina Soljanin , Roy D. Yates 2020

Leader-based data replication improves consistency in highly available distributed storage systems via sequential writes to the leader nodes. After a write has been committed by the leaders, follower nodes are written by a multicast mechanism and are only guaranteed to be eventually consistent. With Age of Information (AoI) as the freshness metric, we characterize how the number of leaders affects the freshness of the data retrieved by an instantaneous read query. In particular, we derive the average age of a read query for a deterministic model for the leader writing time and a probabilistic model for the follower writing time. We obtain a closed-form expression for the average age for exponentially distributed follower writing time. Our numerical results show that, depending on the relative speed of the write operation to the two groups of nodes, there exists an optimal number of leaders which minimizes the average age of the retrieved data, and that this number increases as the relative speed of writing on leaders increases.

النظم الموزعة والتوازية والحوسبة العنقودية قواعد البيانات نظرية المعلومات

4C: A Computation, Communication, and Control Co-Design Framework for CAVs

76 - Liangkai Liu , Shaoshan Liu , 2021

Connected and autonomous vehicles (CAVs) are promising due to their potential safety and efficiency benefits and have attracted massive investment and interest from government agencies, industry, and academia. With more computing and communication re sources are available, both vehicles and edge servers are equipped with a set of camera-based vision sensors, also known as Visual IoT (V-IoT) techniques, for sensing and perception. Tremendous efforts have been made for achieving programmable communication, computation, and control. However, they are conducted mainly in the silo mode, limiting the responsiveness and efficiency of handling challenging scenarios in the real world. To improve the end-to-end performance, we envision that future CAVs require the co-design of communication, computation, and control. This paper presents our vision of the end-to-end design principle for CAVs, called 4C, which extends the V-IoT system by providing a unified communication, computation, and control co-design framework. With programmable communications, fine-grained heterogeneous computation, and efficient vehicle controls in 4C, CAVs can handle critical scenarios and achieve energy-efficient autonomous driving. Finally, we present several challenges to achieving the vision of the 4C framework.

النظم الموزعة والتوازية والحوسبة العنقودية الذكاء الاصطناعي علم الروبوتات

A Visual Analytics Framework for Reviewing Streaming Performance Data

99 - Suraj P. Kesavan , Takanori Fujiwara , Jianping Kelvin Li 2020

Understanding and tuning the performance of extreme-scale parallel computing systems demands a streaming approach due to the computational cost of applying offline algorithms to vast amounts of performance log data. Analyzing large streaming data is challenging because the rate of receiving data and limited time to comprehend data make it difficult for the analysts to sufficiently examine the data without missing important changes or patterns. To support streaming data analysis, we introduce a visual analytic framework comprising of three modules: data management, analysis, and interactive visualization. The data management module collects various computing and communication performance metrics from the monitored system using streaming data processing techniques and feeds the data to the other two modules. The analysis module automatically identifies important changes and patterns at the required latency. In particular, we introduce a set of online and progressive analysis methods for not only controlling the computational costs but also helping analysts better follow the critical aspects of the analysis results. Finally, the interactive visualization module provides the analysts with a coherent view of the changes and patterns in the continuously captured performance data. Through a multi-faceted case study on performance analysis of parallel discrete-event simulation, we demonstrate the effectiveness of our framework for identifying bottlenecks and locating outliers.

النظم الموزعة والتوازية والحوسبة العنقودية تفاعل الإنسان والحاسوب التعلم الآلي

BigDL: A Distributed Deep Learning Framework for Big Data

187 - Jason Dai , Yiheng Wang , Xin Qiu 2018

This paper presents BigDL (a distributed deep learning framework for Apache Spark), which has been used by a variety of users in the industry for building deep learning applications on production big data platforms. It allows deep learning applicatio ns to run on the Apache Hadoop/Spark cluster so as to directly process the production data, and as a part of the end-to-end data analysis pipeline for deployment and management. Unlike existing deep learning frameworks, BigDL implements distributed, data parallel training directly on top of the functional compute model (with copy-on-write and coarse-grained operations) of Spark. We also share real-world experience and war stories of users that have adopted BigDL to address their challenges(i.e., how to easily build end-to-end data analysis and deep learning pipelines for their production data).

النظم الموزعة والتوازية والحوسبة العنقودية الذكاء الاصطناعي التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات