مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Compressed Differential Erasure Codes for Efficient Archival of Versioned Data

131 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jagadeesh Harshan

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف J. Harshan - Anwitaman Datta - Frederique Oggier

نظرية المعلومات النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we study the problem of storing an archive of versioned data in a reliable and efficient manner in distributed storage systems. We propose a new storage technique called differential erasure coding (DEC) where the differences (deltas) between subseque

قيم البحث

146 - Maheswaran Sathiamoorthy , Megasthenis Asteris , Dimitrisn Papailiopoulos 2013

Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choic e and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability. This paper shows how to overcome this limitation. We present a novel family of erasure codes that are efficiently repairable and offer higher reliability compared to Reed-Solomon codes. We show analytically that our codes are optimal on a recently identified tradeoff between locality and minimum distance. We implement our new codes in Hadoop HDFS and compare to a currently deployed HDFS module that uses Reed-Solomon codes. Our modified HDFS implementation shows a reduction of approximately 2x on the repair disk I/O and repair network traffic. The disadvantage of the new coding scheme is that it requires 14% more storage compared to Reed-Solomon codes, an overhead shown to be information theoretically optimal to obtain locality. Because the new codes repair failures faster, this provides higher reliability, which is orders of magnitude higher compared to replication.

نظرية المعلومات النظم الموزعة والتوازية والحوسبة العنقودية بنية الشبكات والإنترنت

Sparsity Exploiting Erasure Coding for Resilient Storage and Efficient I/O Access in Delta based Versioning Systems

132 - J. Harshan , Frederique Oggier , Anwitaman Datta 2014

In this paper we study the problem of storing reliably an archive of versioned data. Specifically, we focus on systems where the differences (deltas) between subseque

نظرية المعلومات النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات

RapidRAID: Pipelined Erasure Codes for Fast Data Archival in Distributed Storage Systems

145 - Lluis Pamies-Juarez , Anwitaman Datta , Frederique Oggier 2012

To achieve reliability in distributed storage systems, data has usually been replicated across different nodes. However the increasing volume of data to be stored has motivated the introduction of erasure codes, a storage efficient alternative to rep lication, particularly suited for archival in data centers, where old datasets (rarely accessed) can be erasure encoded, while replicas are maintained only for the latest data. Many recent works consider the design of new storage-centric erasure codes for improved repairability. In contrast, this paper addresses the migration from replication to encoding: traditionally erasure coding is an atomic operation in that a single node with the whole object encodes and uploads all the encoded pieces. Although large datasets can be concurrently archived by distributing individual object encodings among different nodes, the network and computing capacity of individual nodes constrain the archival process due to such atomicity. We propose a new pipelined coding strategy that distributes the network and computing load of single-object encodings among different nodes, which also speeds up multiple object archival. We further present RapidRAID codes, an explicit family of pipelined erasure codes which provides fast archival without compromising either data reliability or storage overheads. Finally, we provide a real implementation of RapidRAID codes and benchmark its performance using both a cluster of 50 nodes and a set of Amazon EC2 instances. Experiments show that RapidRAID codes reduce a single objects coding time by up to 90%, while when multiple objects are encoded concurrently, the reduction is up to 20%.

النظم الموزعة والتوازية والحوسبة العنقودية

Minimizing the alphabet size of erasure codes with restricted decoding sets

66 - Mira Gonen , Ishay Haviv , Michael Langberg 2020

A Maximum Distance Separable code over an alphabet $F$ is defined via an encoding function $C:F^k rightarrow F^n$ that allows to retrieve a message $m in F^k$ from the codeword $C(m)$ even after erasing any $n-k$ of its symbols. The minimum possible alphabet size of general (non-linear) MDS codes for given parameters $n$ and $k$ is unknown and forms one of the central open problems in coding theory. The paper initiates the study of the alphabet size of codes in a generalized setting where the coding scheme is required to handle a pre-specified subset of all possible erasure patterns, naturally represented by an $n$-vertex $k$-uniform hypergraph. We relate the minimum possible alphabet size of such codes to the strong chromatic number of the hypergraph and analyze the tightness of the obtained bounds for both the linear and non-linear settings. We further consider variations of the problem which allow a small probability of decoding error.

نظرية المعلومات التوافقية نظرية المعلومات

Efficient data compression from statistical physics of codes over finite fields

162 - Alfredo Braunstein , Farbod Kayhan , Riccardo Zecchina 2011

In this paper we discuss a novel data compression technique for binary symmetric sources based on the cavity method over a Galois Field of order q (GF(q)). We present a scheme of low complexity and near optimal empirical performance. The compression step is based on a reduction of sparse low density parity check codes over GF(q) and is done through the so called reinforced belief-propagation equations. These reduced codes appear to have a non-trivial geometrical modification of the space of codewords which makes such compression computationally feasible. The computational complexity is O(d.n.q.log(q)) per iteration, where d is the average degree of the check nodes and n is the number of bits. For our code ensemble, decompression can be done in a time linear in the codes length by a simple leaf-removal algorithm.

نظرية المعلومات الميكانيكا الإحصائية نظرية المعلومات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة البعث

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Compressed Differential Erasure Codes for Efficient Archival of Versioned Data

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

In this paper, we study the problem of storing an archive of versioned data in a reliable and efficient manner in distributed storage systems. We propose a new storage technique called differential erasure coding (DEC) where the differences (deltas) between subseque

اقرأ أيضاً