ﻻ يوجد ملخص باللغة العربية
In large scale distributed storage systems (DSS) deployed in cloud computing, correlated failures resulting in simultaneous failure (or, unavailability) of blocks of nodes are common. In such scenarios, the stored data or a content of a failed node can only be reconstructed from the available live nodes belonging to the available blocks. To analyze the resilience of the system against such block failures, this work introduces the framework of Block Failure Resilient (BFR) codes, wherein the data (e.g., a file in DSS) can be decoded by reading out from a same number of codeword symbols (nodes) from a subset of available blocks of the underlying codeword. Further, repairable BFR codes are introduced, wherein any codeword symbol in a failed block can be repaired by contacting a subset of remaining blocks in the system. File size bounds for repairable BFR codes are derived, and the trade-off between per node storage and repair bandwidth is analyzed, and the corresponding minimum storage regenerating (BFR-MSR) and minimum bandwidth regenerating (BFR-MBR) points are derived. Explicit codes achieving the two operating points for a special case of parameters are constructed, wherein the underlying regenerating codewords are distributed to BFR codeword symbols according to combinatorial designs. Finally, BFR locally repairable codes (BFR-LRC) are introduced, an upper bound on the resilience is derived and optimal code construction are provided by a concatenation of Gabidulin and MDS codes. Repair efficiency of BFR-LRC is further studied via the use of BFR-MSR/MBR codes as local codes. Code constructions achieving optimal resilience for BFR-MSR/MBR-LRCs are provided for certain parameter regimes. Overall, this work introduces the framework of block failures along with optimal code constructions, and the study of architecture-aware coding for distributed storage systems.
This paper aims to go beyond resilience into the study of security and local-repairability for distributed storage systems (DSS). Security and local-repairability are both important as features of an efficient storage system, and this paper aims to u
This chapter deals with the topic of designing reliable and efficient codes for the storage and retrieval of large quantities of data over storage devices that are prone to failure. For long, the traditional objective has been one of ensuring reliabi
In a distributed storage system, code symbols are dispersed across space in nodes or storage units as opposed to time. In settings such as that of a large data center, an important consideration is the efficient repair of a failed node. Efficient rep
We introduce a new family of Fountain codes that are systematic and also have sparse parities. Given an input of $k$ symbols, our codes produce an unbounded number of output symbols, generating each parity independently by linearly combining a logari
In this work it is shown that locally repairable codes (LRCs) can be list-decoded efficiently beyond the Johnson radius for a large range of parameters by utilizing the local error-correction capabilities. The corresponding decoding radius is derived