ترغب بنشر مسار تعليمي؟ اضغط هنا

Reaching Data Confidentiality and Model Accountability on the CalTrain

87   0   0.0 ( 0 )
 نشر من قبل Zhongshu Gu
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Distributed collaborative learning (DCL) paradigms enable building joint machine learning models from distrusting multi-party participants. Data confidentiality is guaranteed by retaining private training data on each participants local infrastructure. However, this approach to achieving data confidentiality makes todays DCL designs fundamentally vulnerable to data poisoning and backdoor attacks. It also limits DCLs model accountability, which is key to backtracking the responsible bad training data instances/contributors. In this paper, we introduce CALTRAIN, a Trusted Execution Environment (TEE) based centralized multi-party collaborative learning system that simultaneously achieves data confidentiality and model accountability. CALTRAIN enforces isolated computation on centrally aggregated training data to guarantee data confidentiality. To support building accountable learning models, we securely maintain the links between training instances and their corresponding contributors. Our evaluation shows that the models generated from CALTRAIN can achieve the same prediction accuracy when compared to the models trained in non-protected environments. We also demonstrate that when malicious training participants tend to implant backdoors during model training, CALTRAIN can accurately and precisely discover the poisoned and mislabeled training data that lead to the runtime mispredictions.



قيم البحث

اقرأ أيضاً

Byzantine fault tolerant (BFT) consensus protocols are traditionally developed to support reliable distributed computing. For applications where the protocol participants are economic agents, recent works highlighted the importance of accountability: the ability to identify participants who provably violate the protocol. We propose to evaluate the security of an accountable protocol in terms of its liveness resilience, the minimum number of Byzantine nodes when liveness is violated, and its accountable safety resilience, the minimum number of accountable Byzantine nodes when safety is violated. We characterize the optimal tradeoffs between these two resiliences in different network environments, and identify an availability-accountability dilemma: in an environment with dynamic participation, no protocol can simultaneously be accountably-safe and live. We provide a resolution to this dilemma by constructing an optimally-resilient accountability gadget to checkpoint a longest chain protocol, such that the full ledger is live under dynamic participation and the checkpointed prefix ledger is accountable. Our accountability gadget construction is black-box and can use any BFT protocol which is accountable under static participation. Using HotStuff as the black box, we implemented our construction as a protocol for the Ethereum 2.0 beacon chain, and our Internet-scale experiments with more than 4000 nodes show that the protocol can achieve the required scalability and has better latency than the current solution Gasper, while having the advantage of being provably secure. To contrast, we demonstrate a new attack on Gasper.
Resource and cost constraints remain a challenge for wireless sensor network security. In this paper, we propose a new approach to protect confidentiality against a parasitic adversary, which seeks to exploit sensor networks by obtaining measurements in an unauthorized way. Our low-complexity solution, GossiCrypt, leverages on the large scale of sensor networks to protect confidentiality efficiently and effectively. GossiCrypt protects data by symmetric key encryption at their source nodes and re-encryption at a randomly chosen subset of nodes en route to the sink. Furthermore, it employs key refreshing to mitigate the physical compromise of cryptographic keys. We validate GossiCrypt analytically and with simulations, showing it protects data confidentiality with probability almost one. Moreover, compared with a system that uses public-key data encryption, the energy consumption of GossiCrypt is one to three orders of magnitude lower.
Machine learned models trained on organizational communication data, such as emails in an enterprise, carry unique risks of breaching confidentiality, even if the model is intended only for internal use. This work shows how confidentiality is distinc t from privacy in an enterprise context, and aims to formulate an approach to preserving confidentiality while leveraging principles from differential privacy. The goal is to perform machine learning tasks, such as learning a language model or performing topic analysis, using interpersonal communications in the organization, while not learning about confidential information shared in the organization. Works that apply differential privacy techniques to natural language processing tasks usually assume independently distributed data, and overlook potential correlation among the records. Ignoring this correlation results in a fictional promise of privacy. Naively extending differential privacy techniques to focus on group privacy instead of record-level privacy is a straightforward approach to mitigate this issue. This approach, although providing a more realistic privacy-guarantee, is over-cautious and severely impacts model utility. We show this gap between these two extreme measures of privacy over two language tasks, and introduce a middle-ground solution. We propose a model that captures the correlation in the social network graph, and incorporates this correlation in the privacy calculations through Pufferfish privacy principles.
Cryptocurrency off-chain networks such as Lightning (e.g., Bitcoin) or Raiden (e.g., Ethereum) aim to increase the scalability of traditional on-chain transactions. To support nodes in learning about possible paths to route their transactions, these networks need to provide gossip and probing mechanisms. This paper explores whether these mechanisms may be exploited to infer sensitive information about the flow of transactions, and eventually harm privacy. In particular, we identify two threats, related to an active and a passive adversary. The first is a probing attack: here the adversary aims to detect the maximum amount which is transferable in a given direction over a target channel by actively probing it and differentiating the response messages it receives. The second is a timing attack: the adversary discovers how close the destination of a routed payment actually is, by acting as a passive man-in-the middle and analyzing the time deltas between sent messages and their corresponding responses. We then analyze the limitations of these attacks and propose remediations for scenarios in which they are able to produce accurate results.
We describe and implement a policy language. In our system, agents can distribute data along with usage policies in a decentralized architecture. Our language supports the specification of conditions and obligations, and also the possibility to refin e policies. In our framework, the compliance with usage policies is not actively enforced. However, agents are accountable for their actions, and may be audited by an authority requiring justifications.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا