No Arabic abstract
Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, requires less redundancy than simple replication for the same level of reliability. However, since fragments must be periodically replaced as nodes fail, a key question is how to generate encoded fragments in a distributed way while transferring as little data as possible across the network. For an erasure coded system, a common practice to repair from a node failure is for a new node to download subsets of data stored at a number of surviving nodes, reconstruct a lost coded block using the downloaded data, and store it at the new node. We show that this procedure is sub-optimal. We introduce the notion of regenerating codes, which allow a new node to download emph{functions} of the stored data from the surviving nodes. We show that regenerating codes can significantly reduce the repair bandwidth. Further, we show that there is a fundamental tradeoff between storage and repair bandwidth which we theoretically characterize using flow arguments on an appropriately constructed graph. By invoking constructive results in network coding, we introduce regenerating codes that can achieve any point in this optimal tradeoff.
Physical-layer Network Coding (PNC) can significantly improve the throughput of wireless two way relay channel (TWRC) by allowing the two end nodes to transmit messages to the relay simultaneously. To achieve reliable communication, channel coding could be applied on top of PNC. This paper investigates link-by-link channel-coded PNC, in which a critical process at the relay is to transform the superimposed channel-coded packets received from the two end nodes plus noise, Y3=X1+X2+W3, to the network-coded combination of the source packets, S1 XOR S2 . This is in distinct to the traditional multiple-access problem, in which the goal is to obtain S1 and S2 separately. The transformation from Y3 to (S1 XOR S2) is referred to as the Channel-decoding-Network-Coding process (CNC) in that it involves both channel decoding and network coding operations. A contribution of this paper is the insight that in designing CNC, we should first (i) channel-decode Y3 to the superimposed source symbols S1+S2 before (ii) transforming S1+S2 to the network-coded packets (S1 XOR S2) . Compared with previously proposed strategies for CNC, this strategy reduces the channel-coding network-coding mismatch. It is not obvious, however, that an efficient decoder for step (i) exists. A second contribution of this paper is to provide an explicit construction of such a decoder based on the use of the Repeat Accumulate (RA) code. Specifically, we redesign the belief propagation algorithm of the RA code for traditional point-to-point channel to suit the need of the PNC multiple-access channel. Simulation results show that our new scheme outperforms the previously proposed schemes significantly in terms of BER without added complexity.
Erasure codes are increasingly being studied in the context of implementing atomic memory objects in large scale asynchronous distributed storage systems. When compared with the traditional replication based schemes, erasure codes have the potential of significantly lowering storage and communication costs while simultaneously guaranteeing the desired resiliency levels. In this work, we propose the Storage-Optimized Data-Atomic (SODA) algorithm for implementing atomic memory objects in the multi-writer multi-reader setting. SODA uses Maximum Distance Separable (MDS) codes, and is specifically designed to optimize the total storage cost for a given fault-tolerance requirement. For tolerating $f$ server crashes in an $n$-server system, SODA uses an $[n, k]$ MDS code with $k=n-f$, and incurs a total storage cost of $frac{n}{n-f}$. SODA is designed under the assumption of reliable point-to-point communication channels. The communication cost of a write and a read operation are respectively given by $O(f^2)$ and $frac{n}{n-f}(delta_w+1)$, where $delta_w$ denotes the number of writes that are concurrent with the particular read. In comparison with the recent CASGC algorithm, which also uses MDS codes, SODA offers lower storage cost while pays more on the communication cost. We also present a modification of SODA, called SODA$_{text{err}}$, to handle the case where some of the servers can return erroneous coded elements during a read operation. Specifically, in order to tolerate $f$ server failures and $e$ error-prone coded elements, the SODA$_{text{err}}$ algorithm uses an $[n, k]$ MDS code such that $k=n-2e-f$. SODA$_{text{err}}$ also guarantees liveness and atomicity, while maintaining an optimized total storage cost of $frac{n}{n-f-2e}$.
The growing demand for high-speed data, quality of service (QoS) assurance and energy efficiency has triggered the evolution of 4G LTE-A networks to 5G and beyond. Interference is still a major performance bottleneck. This paper studies the application of physical-layer network coding (PNC), a technique that exploits interference, in heterogeneous cellular networks. In particular, we propose a rate-maximising relay selection algorithm for a single cell with multiple relays based on the decode-and-forward strategy. With nodes transmitting at different powers, the proposed algorithm adapts the resource allocation according to the differing link rates and we prove theoretically that the optimisation problem is log-concave. The proposed technique is shown to perform significantly better than the widely studied selection-cooperation technique. We then undertake an experimental study on a software radio platform of the decoding performance of PNC with unbalanced SNRs in the multiple-access transmissions. This problem is inherent in cellular networks and it is shown that with channel coding and decoders based on multiuser detection and successive interference cancellation, the performance is better with power imbalance. This paper paves the way for further research in multi-cell PNC, resource allocation, and the implementation of PNC with higher-order modulations and advanced coding techniques.
We study the data reliability problem for a community of devices forming a mobile cloud storage system. We consider the application of regenerating codes for file maintenance within a geographically-limited area. Such codes require lower bandwidth to regenerate lost data fragments compared to file replication or reconstruction. We investigate threshold-based repair strategies where data repair is initiated after a threshold number of data fragments have been lost due to node mobility. We show that at a low departure-to-repair rate regime, a lazy repair strategy in which repairs are initiated after several nodes have left the system outperforms eager repair in which repairs are initiated after a single departure. This optimality is reversed when nodes are highly mobile. We further compare distributed and centralized repair strategies and derive the optimal repair threshold for minimizing the average repair cost per unit of time, as a function of underlying code parameters. In addition, we examine cooperative repair strategies and show performance improvements compared to non-cooperative codes. We investigate several models for the time needed for node repair including a simple fixed time model that allows for the computation of closed-form expressions and a more realistic model that takes into account the number of repaired nodes. We derive the conditions under which the former model approximates the latter. Finally, an extended model where additional failures are allowed during the repair process is investigated. Overall, our results establish the joint effect of code design and repair algorithms on the maintenance cost of distributed storage systems.
Wireless traffic attributable to machine learning (ML) inference workloads is increasing with the proliferation of applications and smart wireless devices leveraging ML inference. Owing to limited compute capabilities at these edge devices, achieving high inference accuracy often requires coordination with a remote compute node or cloud over the wireless cellular network. The accuracy of this distributed inference is, thus, impacted by the communication rate and reliability offered by the cellular network. In this paper, an analytical framework is proposed to characterize inference accuracy as a function of cellular network design. Using the developed framework, it is shown that cellular network should be provisioned with a minimum density of access points (APs) to guarantee a target inference accuracy, and the inference accuracy achievable at asymptotically high AP density is limited by the air-interface bandwidth. Furthermore, the minimum accuracy required of edge inference to deliver a target inference accuracy is shown to be inversely proportional to the density of APs and the bandwidth.