أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Anwitaman Datta

Compressed Differential Erasure Codes for Efficient Archival of Versioned Data

82 - J. Harshan , Anwitaman Datta , Frederique Oggier 2015

In this paper, we study the problem of storing an archive of versioned data in a reliable and efficient manner in distributed storage systems. We propose a new storage technique called differential erasure coding (DEC) where the differences (deltas) between subseque

نظرية المعلومات النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات

Sparsity Exploiting Erasure Coding for Resilient Storage and Efficient I/O Access in Delta based Versioning Systems

83 - J. Harshan , Frederique Oggier , Anwitaman Datta 2014

In this paper we study the problem of storing reliably an archive of versioned data. Specifically, we focus on systems where the differences (deltas) between subseque

نظرية المعلومات النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات

Cooperation and Competition when Bidding for Complex Projects: Centralized and Decentralized Perspectives

90 - Piotr Skowron , Krzysztof Rzadca , Anwitaman Datta 2014

To successfully complete a complex project, be it a construction of an airport or of a backbone IT system, agents (companies or individuals) must form a team having required competences and resources. A team can be formed either by the project issuer based on individual agents offers (centralized formation); or by the agents themselves (decentralized formation) bidding for a project as a consortium---in that case many feasible teams compete for the contract. We investigate rational strategies of the agents (what salary should they ask? with whom should they team up?). We propose concepts to characterize the stability of the winning teams and study their computational complexity.

علوم الكمبيوتر ونظرية الألعاب

On the Effectiveness of Polynomial Realization of Reed-Solomon Codes for Storage Systems

94 - Kyumars Sheykh Esmaili , Anwitaman Datta 2013

There are different ways to realize Reed Solomon (RS) codes. While in the storage community, using the generator matrices to implement RS codes is more popular, in the coding theory community the generator polynomials are typically used to realize RS codes. Prominent exceptions include HDFS-RAID, which uses generator polynomial based erasure codes, and extends the Apache Hadoops file system. In this paper we evaluate the performance of an implementation of polynomial realization of Reed-Solomon codes, along with our optimized version of it, against that of a widely-used library (Jerasure) that implements the main matrix realization alternatives. Our experimental study shows that despite significant performance gains yielded by our optimizations, the polynomial implementations performance is constantly inferior to those of matrix realization alternatives in general, and that of Cauchy bit matrices in particular.

نظرية المعلومات الأداء نظرية المعلومات

Streamforce: outsourcing access control enforcement for stream data to the clouds

118 - Tien Tuan Anh Dinh , Anwitaman Datta 2013

As tremendous amount of data being generated everyday from human activity and from devices equipped with sensing capabilities, cloud computing emerges as a scalable and cost-effective platform to store and manage the data. While benefits of cloud com puting are numerous, security concerns arising when data and computation are outsourced to a third party still hinder the complete movement to the cloud. In this paper, we focus on the problem of data privacy on the cloud, particularly on access controls over stream data. The nature of stream data and the complexity of sharing data make access control a more challenging issue than in traditional archival databases. We present Streamforce - a system allowing data owners to securely outsource their data to the cloud. The owner specifies fine-grained policies which are enforced by the cloud. The latter performs most of the heavy computations, while learning nothing about the data. To this end, we employ a number of encryption schemes, including deterministic encryption, proxy-based attribute based encryption and sliding-window encryption. In Streamforce, access control policies are modeled as secure continuous queries, which entails minimal changes to existing stream processing engines, and allows for easy expression of a wide-range of policies. In particular, Streamforce comes with a number of secure query operators including Map, Filter, Join and Aggregate. Finally, we implement Streamforce over an open source stream processing engine (Esper) and evaluate its performance on a cloud platform. The results demonstrate practical performance for many real-world applications, and although the security overhead is visible, Streamforce is highly scalable.

قواعد البيانات التشفير والأمن

The Zen of Multidisciplinary Team Recommendation

69 - Anwitaman Datta , Stefano Braghin , Jackson Tan Teck Yong 2013

In order to accomplish complex tasks, it is often necessary to compose a team consisting of experts with diverse competencies. However, for proper functioning, it is also preferable that a team be socially cohesive. A team recommendation system, whic h facilitates the search for potential team members can be of great help both for (i) individuals who need to seek out collaborators and (ii) managers who need to build a team for some specific tasks. A decision support system which readily helps summarize such metrics, and possibly rank the teams in a personalized manner according to the end users preferences, can be a great tool to navigate what would otherwise be an information avalanche. In this work we present a general framework of how to compose such subsystems together to build a composite team recommendation system, and instantiate it for a case study of academic teams.

الشبكات الاجتماعية والمعلومات استرجاع المعلومات الفيزياء والمجتمع

The CORE Storage Primitive: Cross-Object Redundancy for Efficient Data Repair & Access in Erasure Coded Storage

92 - Kyumars Sheykh Esmaili , Lluis Pamies-Juarez , Anwitaman Datta 2013

Erasure codes are an integral part of many distributed storage systems aimed at Big Data, since they provide high fault-tolerance for low overheads. However, traditional erasure codes are inefficient on reading stored data in degraded environments (w hen nodes might be unavailable), and on replenishing lost data (vital for long term resilience). Consequently, novel codes optimized to cope with distributed storage system nuances are vigorously being researched. In this paper, we take an engineering alternative, exploring the use of simple and mature techniques -juxtaposing a standard erasure code with RAID-4 like parity. We carry out an analytical study to determine the efficacy of this approach over traditional as well as some novel codes. We build upon this study to design CORE, a general storage primitive that we integrate into HDFS. We benchmark this implementation in a proprietary cluster and in EC2. Our experiments show that compared to traditional erasure codes, CORE uses 50% less bandwidth and is up to 75% faster while recovering a single failed node, while the gains are respectively 15% and 60% for double node failures.

النظم الموزعة والتوازية والحوسبة العنقودية

RapidRAID: Pipelined Erasure Codes for Fast Data Archival in Distributed Storage Systems

101 - Lluis Pamies-Juarez , Anwitaman Datta , Frederique Oggier 2012

To achieve reliability in distributed storage systems, data has usually been replicated across different nodes. However the increasing volume of data to be stored has motivated the introduction of erasure codes, a storage efficient alternative to rep lication, particularly suited for archival in data centers, where old datasets (rarely accessed) can be erasure encoded, while replicas are maintained only for the latest data. Many recent works consider the design of new storage-centric erasure codes for improved repairability. In contrast, this paper addresses the migration from replication to encoding: traditionally erasure coding is an atomic operation in that a single node with the whole object encodes and uploads all the encoded pieces. Although large datasets can be concurrently archived by distributing individual object encodings among different nodes, the network and computing capacity of individual nodes constrain the archival process due to such atomicity. We propose a new pipelined coding strategy that distributes the network and computing load of single-object encodings among different nodes, which also speeds up multiple object archival. We further present RapidRAID codes, an explicit family of pipelined erasure codes which provides fast archival without compromising either data reliability or storage overheads. Finally, we provide a real implementation of RapidRAID codes and benchmark its performance using both a cluster of 50 nodes and a set of Amazon EC2 instances. Experiments show that RapidRAID codes reduce a single objects coding time by up to 90%, while when multiple objects are encoded concurrently, the reduction is up to 20%.

النظم الموزعة والتوازية والحوسبة العنقودية

An Empirical Study of the Repair Performance of Novel Coding Schemes for Networked Distributed Storage Systems

64 - Lluis Pamies-Juarez , Frederique Oggier , Anwitaman Datta 2012

Erasure coding techniques are getting integrated in networked distributed storage systems as a way to provide fault-tolerance at the cost of less storage overhead than traditional replication. Redundancy is maintained over time through repair mechani sms, which may entail large network resource overheads. In recent years, several novel codes tailor-made for distributed storage have been proposed to optimize storage overhead and repair, such as Regenerating Codes that minimize the per repair traffic, or Self-Repairing Codes which minimize the number of nodes contacted per repair. Existing studies of these coding techniques are however predominantly theoretical, under the simplifying assumption that only one object is stored. They ignore many practical issues that real systems must address, such as data placement, de/correlation of multiple stored objects, or the competition for limited network resources when multiple objects are repaired simultaneously. This paper empirically studies the repair performance of these novel storage centric codes with respect to classical erasure codes by simulating realistic scenarios and exploring the interplay of code parameters, failure characteristics and data placement with respect to the trade-offs of bandwidth usage and speed of repairs.

النظم الموزعة والتوازية والحوسبة العنقودية

CloudMine: Multi-Party Privacy-Preserving Data Analytics Service

104 - Dinh Tien Tuan Anh , Quach Vinh Thanh , Anwitaman Datta 2012

An increasing number of businesses are replacing their data storage and computation infrastructure with cloud services. Likewise, there is an increased emphasis on performing analytics based on multiple datasets obtained from different data sources. While ensuring security of data and computation outsourced to a third party cloud is in itself challenging, supporting analytics using data distributed across multiple, independent clouds is even further from trivial. In this paper we present CloudMine, a cloud-based service which allows multiple data owners to perform privacy-preserved computation over the joint data using their clouds as delegates. CloudMine protects data privacy with respect to semi-honest data owners and semi-honest clouds. It furthermore ensures the privacy of the computation outputs from the curious clouds. It allows data owners to reliably detect if their cloud delegates have been lazy when carrying out the delegated computation. CloudMine can run as a centralized service on a single cloud, or as a distributed service over multiple, independent clouds. CloudMine supports a set of basic computations that can be used to construct a variety of highly complex, distributed privacy-preserving data analytics. We demonstrate how a simple instance of CloudMine (secure sum service) is used to implement three classical data mining tasks (classification, association rule mining and clustering) in a cloud environment. We experiment with a prototype of the service, the results of which suggest its practicality for supporting privacy-preserving data analytics as a (multi) cloud-based service.

التشفير والأمن النظم الموزعة والتوازية والحوسبة العنقودية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد