أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل O. Ozan Koyluoglu

FastSecAgg: Scalable Secure Aggregation for Privacy-Preserving Federated Learning

153 - Swanand Kadhe , Nived Rajaraman , O. Ozan Koyluoglu 2020

Recent attacks on federated learning demonstrate that keeping the training data on clients devices does not provide sufficient privacy, as the model parameters shared by clients can leak information about their training data. A secure aggregation pro tocol enables the server to aggregate clients models in a privacy-preserving manner. However, existing secure aggregation protocols incur high computation/communication costs, especially when the number of model parameters is larger than the number of clients participating in an iteration -- a typical scenario in federated learning. In this paper, we propose a secure aggregation protocol, FastSecAgg, that is efficient in terms of computation and communication, and robust to client dropouts. The main building block of FastSecAgg is a novel multi-secret sharing scheme, FastShare, based on the Fast Fourier Transform (FFT), which may be of independent interest. FastShare is information-theoretically secure, and achieves a trade-off between the number of secrets, privacy threshold, and dropout tolerance. Riding on the capabilities of FastShare, we prove that FastSecAgg is (i) secure against the server colluding with any subset of some constant fraction (e.g. $sim10%$) of the clients in the honest-but-curious setting; and (ii) tolerates dropouts of a random subset of some constant fraction (e.g. $sim10%$) of the clients. FastSecAgg achieves significantly smaller computation cost than existing schemes while achieving the same (orderwise) communication cost. In addition, it guarantees security against adaptive adversaries, which can perform client corruptions dynamically during the execution of the protocol.

التشفير والأمن نظرية المعلومات التعلم الآلي

Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning

310 - Swanand Kadhe , O. Ozan Koyluoglu , 2020

Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, need to overcome two limitations: delays caused by slow running machines called stragglers, and communication overheads. Recently, Ye and Abbe [ICML 2018] proposed a coding-theoretic paradigm to characterize a fundamental trade-off between computation load per worker, communication overhead per worker, and straggler tolerance. However, their proposed coding schemes suffer from heavy decoding complexity and poor numerical stability. In this paper, we develop a communication-efficient gradient coding framework to overcome these drawbacks. Our proposed framework enables using any linear code to design the encoding and decoding functions. When a particular code is used in this framework, its block-length determines the computation load, dimension determines the communication overhead, and minimum distance determines the straggler tolerance. The flexibility of choosing a code allows us to gracefully trade-off the straggler threshold and communication overhead for smaller decoding complexity and higher numerical stability. Further, we show that using a maximum distance separable (MDS) code generated by a random Gaussian matrix in our framework yields a gradient code that is optimal with respect to the trade-off and, in addition, satisfies stronger guarantees on numerical stability as compared to the previously proposed schemes. Finally, we evaluate our proposed framework on Amazon EC2 and demonstrate that it reduces the average iteration time by 16% as compared to prior gradient coding schemes.

نظرية المعلومات النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers

208 - Swanand Kadhe , O. Ozan Koyluoglu , Kannan Ramchandran 2019

Distributed implementations of gradient-based methods, wherein a server distributes gradient computations across worker machines, suffer from slow running machines, called stragglers. Gradient coding is a coding-theoretic framework to mitigate stragg lers by enabling the server to recover the gradient sum in the presence of stragglers. Approximate gradient codes are variants of gradient codes that reduce computation and storage overhead per worker by allowing the server to approximately reconstruct the gradient sum. In this work, our goal is to construct approximate gradient codes that are resilient to stragglers selected by a computationally unbounded adversary. Our motivation for constructing codes to mitigate adversarial stragglers stems from the challenge of tackling stragglers in massive-scale elastic and serverless systems, wherein it is difficult to statistically model stragglers. Towards this end, we propose a class of approximate gradient codes based on balanced incomplete block designs (BIBDs). We show that the approximation error for these codes depends only on the number of stragglers, and thus, adversarial straggler selection has no advantage over random selection. In addition, the proposed codes admit computationally efficient decoding at the server. Next, to characterize fundamental limits of adversarial straggling, we consider the notion of adversarial threshold -- the smallest number of workers that an adversary must straggle to inflict certain approximation error. We compute a lower bound on the adversarial threshold, and show that codes based on symmetric BIBDs maximize this lower bound among a wide class of codes, making them excellent candidates for mitigating adversarial stragglers.

نظرية المعلومات النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

On the organization of grid and place cells: Neural de-noising via subspace learning

206 - David M. Schwartz , O. Ozan Koyluoglu 2017

Place cells in the hippocampus are active when an animal visits a certain location (referred to as a place field) within an environment. Grid cells in the medial entorhinal cortex (MEC) respond at multiple locations, with firing fields that form a pe riodic and hexagonal tiling of the environment. The joint activity of grid and place cell populations, as a function of location, forms a neural code for space. An ensemble of codes is generated by varying grid and place cell population parameters. For each code in this ensemble, codewords are generated by stimulating a network with a discrete set of locations. In this manuscript, we develop an understanding of the relationships between coding theoretic properties of these combined populations and code construction parameters. These relationships are revisited by measuring the performances of biologically realizable algorithms implemented by networks of place and grid cell populations, as well as constraint neurons, which perform de-noising operations. Objectives of this work include the investigation of coding theoretic limitations of the mammalian neural code for location and how communication between grid and place cell networks may improve the accuracy of each populations representation. Simulations demonstrate that de-noising mechanisms analyzed here can significantly improve fidelity of this neural representation of space. Further, patterns observed in connectivity of each population of simulated cells suggest that inter-hippocampal-medial-entorhinal-cortical connectivity decreases downward along the dorsoventral axis.

الخلايا العصبية والإدراك نظرية المعلومات التعلم الآلي

Vulnerabilities of Massive MIMO Systems Against Pilot Contamination Attacks

132 - Berk Akgun , Marwan Krunz , O. Ozan Koyluoglu 2017

We consider a single-cell massive MIMO system in which a base station (BS) with a large number of antennas transmits simultaneously to several single-antenna users in the presence of an attacker.The BS acquires the channel state information (CSI) bas ed on uplink pilot transmissions. In this work, we demonstrate the vulnerability of CSI estimation phase to malicious attacks. For that purpose, we study two attack models. In the first model, the attacker aims at minimizing the sum-rate of downlink transmissions by contaminating the uplink pilots. In the second model, the attacker exploits its in-band full-duplex capabilities to generate jamming signals in both the CSI estimation and data transmission phases. We study these attacks under two downlink power allocation strategies when the attacker knows and does not know the locations of the BS and users. The formulated problems are solved using stochastic optimization, Lagrangian minimization, and game-theoretic methods. A closed-form solution for a special case of the problem is obtained. Furthermore, we analyze the achievable individual secrecy rates under a pilot contamination attack, and provide an upper bound on these rates. Our results indicate that the proposed attacks degrade the throughput of a massive MIMO system by more than half.

نظرية المعلومات نظرية المعلومات

Repair Strategies for Storage on Mobile Clouds

133 - Gokhan Calis , Swetha Shivaramaiah , O. Ozan Koyluoglu 2017

We study the data reliability problem for a community of devices forming a mobile cloud storage system. We consider the application of regenerating codes for file maintenance within a geographically-limited area. Such codes require lower bandwidth to regenerate lost data fragments compared to file replication or reconstruction. We investigate threshold-based repair strategies where data repair is initiated after a threshold number of data fragments have been lost due to node mobility. We show that at a low departure-to-repair rate regime, a lazy repair strategy in which repairs are initiated after several nodes have left the system outperforms eager repair in which repairs are initiated after a single departure. This optimality is reversed when nodes are highly mobile. We further compare distributed and centralized repair strategies and derive the optimal repair threshold for minimizing the average repair cost per unit of time, as a function of underlying code parameters. In addition, we examine cooperative repair strategies and show performance improvements compared to non-cooperative codes. We investigate several models for the time needed for node repair including a simple fixed time model that allows for the computation of closed-form expressions and a more realistic model that takes into account the number of repaired nodes. We derive the conditions under which the former model approximates the latter. Finally, an extended model where additional failures are allowed during the repair process is investigated. Overall, our results establish the joint effect of code design and repair algorithms on the maintenance cost of distributed storage systems.

بنية الشبكات والإنترنت نظرية المعلومات نظرية المعلومات

Joint secrecy over the K-Transmitter Multiple Access Channel

101 - Yanling Chen , O. Ozan Koyluoglu , A. J. Han Vinck 2017

This paper studies the problem of secure communication over a K-transmitter multiple access channel in the presence of an external eavesdropper, subject to a joint secrecy constraint (i.e., information leakage rate from the collection of K messages t o an eavesdropper is made vanishing). As a result, we establish the joint secrecy achievable rate region. To this end, our results build upon two techniques in addition to the standard information-theoretic methods. The first is a generalization of Chia-El Gamals lemma on entropy bound for a set of codewords given partial information. The second is to utilize a compact representation of a list of sets that, together with properties of mutual information, leads to an efficient Fourier-Motzkin elimination. These two approaches could also be of independent interests in other contexts.

نظرية المعلومات نظرية المعلومات

Architecture-aware Coding for Distributed Storage: Repairable Block Failure Resilient Codes

120 - Gokhan Calis , O. Ozan Koyluoglu 2016

In large scale distributed storage systems (DSS) deployed in cloud computing, correlated failures resulting in simultaneous failure (or, unavailability) of blocks of nodes are common. In such scenarios, the stored data or a content of a failed node c an only be reconstructed from the available live nodes belonging to the available blocks. To analyze the resilience of the system against such block failures, this work introduces the framework of Block Failure Resilient (BFR) codes, wherein the data (e.g., a file in DSS) can be decoded by reading out from a same number of codeword symbols (nodes) from a subset of available blocks of the underlying codeword. Further, repairable BFR codes are introduced, wherein any codeword symbol in a failed block can be repaired by contacting a subset of remaining blocks in the system. File size bounds for repairable BFR codes are derived, and the trade-off between per node storage and repair bandwidth is analyzed, and the corresponding minimum storage regenerating (BFR-MSR) and minimum bandwidth regenerating (BFR-MBR) points are derived. Explicit codes achieving the two operating points for a special case of parameters are constructed, wherein the underlying regenerating codewords are distributed to BFR codeword symbols according to combinatorial designs. Finally, BFR locally repairable codes (BFR-LRC) are introduced, an upper bound on the resilience is derived and optimal code construction are provided by a concatenation of Gabidulin and MDS codes. Repair efficiency of BFR-LRC is further studied via the use of BFR-MSR/MBR codes as local codes. Code constructions achieving optimal resilience for BFR-MSR/MBR-LRCs are provided for certain parameter regimes. Overall, this work introduces the framework of block failures along with optimal code constructions, and the study of architecture-aware coding for distributed storage systems.

نظرية المعلومات نظرية المعلومات

Exploiting Full-duplex Receivers for Achieving Secret Communications in Multiuser MISO Networks

101 - Berk Akgun , O. Ozan Koyluoglu , Marwan Krunz 2016

We consider a broadcast channel, in which a multi-antenna transmitter (Alice) sends $K$ confidential information signals to $K$ legitimate users (Bobs) in the presence of $L$ eavesdroppers (Eves). Alice uses MIMO precoding to generate the information signals along with her own (Tx-based) friendly jamming. Interference at each Bob is removed by MIMO zero-forcing. This, however, leaves a vulnerability region around each Bob, which can be exploited by a nearby Eve. We address this problem by augmenting Tx-based friendly jamming (TxFJ) with Rx-based friendly jamming (RxFJ), generated by each Bob. Specifically, each Bob uses self-interference suppression (SIS) to transmit a friendly jamming signal while simultaneously receiving an information signal over the same channel. We minimize the powers allocated to the information, TxFJ, and RxFJ signals under given guarantees on the individual secrecy rate for each Bob. The problem is solved for the cases when the eavesdroppers channel state information is known/unknown. Simulations show the effectiveness of the proposed solution. Furthermore, we discuss how to schedule transmissions when the rate requirements need to be satisfied on average rather than instantaneously. Under special cases, a scheduling algorithm that serves only the strongest receivers is shown to outperform the one that schedules all receivers.

نظرية المعلومات نظرية المعلومات

Centralized Repair of Multiple Node Failures with Applications to Communication Efficient Secret Sharing

119 - Ankit Singh Rawat , O. Ozan Koyluoglu , Sriram Vishwanath 2016

This paper considers a distributed storage system, where multiple storage nodes can be reconstructed simultaneously at a centralized location. This centralized multi-node repair (CMR) model is a generalization of regenerating codes that allow for ban dwidth-efficient repair of a single failed node. This work focuses on the trade-off between the amount of data stored and repair bandwidth in this CMR model. In particular, repair bandwidth bounds are derived for the minimum storage multi-node repair (MSMR) and the minimum bandwidth multi-node repair (MBMR) operating points. The tightness of these bounds are analyzed via code constructions. The MSMR point is characterized through codes achieving this point under functional repair for general set of CMR parameters, as well as with codes enabling exact repair for certain CMR parameters. The MBMR point, on the other hand, is characterized with exact repair codes for all CMR parameters for systems that satisfy a certain entropy accumulation property. Finally, the model proposed here is utilized for the secret sharing problem, where the codes for the multi-node repair problem is used to construct communication efficient secret sharing schemes with the property of bandwidth efficient share repair.

نظرية المعلومات نظرية المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد