ترغب بنشر مسار تعليمي؟ اضغط هنا

BlobSeer: How to Enable Efficient Versioning for Large Object Storage under Heavy Access Concurrency

272   0   0.0 ( 0 )
 نشر من قبل Bogdan Nicolae
 تاريخ النشر 2009
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Bogdan Nicolae




اسأل ChatGPT حول البحث

To accommodate the needs of large-scale distributed P2P systems, scalable data management strategies are required, allowing applications to efficiently cope with continuously growing, highly dis tributed data. This paper addresses the problem of efficiently stor ing and accessing very large binary data objects (blobs). It proposesan efficient versioning scheme allowing a large number of clients to concurrently read, write and append data to huge blobs that are fragmented and distributed at a very large scale. Scalability under heavy concurrency is achieved thanks to an original metadata scheme, based on a distributed segment tree built on top of a Distributed Hash Table (DHT). Our approach has been implemented and experimented within our BlobSeer prototype on the Grid5000 testbed, using up to 175 nodes.



قيم البحث

اقرأ أيضاً

Erasure codes are an integral part of many distributed storage systems aimed at Big Data, since they provide high fault-tolerance for low overheads. However, traditional erasure codes are inefficient on reading stored data in degraded environments (w hen nodes might be unavailable), and on replenishing lost data (vital for long term resilience). Consequently, novel codes optimized to cope with distributed storage system nuances are vigorously being researched. In this paper, we take an engineering alternative, exploring the use of simple and mature techniques -juxtaposing a standard erasure code with RAID-4 like parity. We carry out an analytical study to determine the efficacy of this approach over traditional as well as some novel codes. We build upon this study to design CORE, a general storage primitive that we integrate into HDFS. We benchmark this implementation in a proprietary cluster and in EC2. Our experiments show that compared to traditional erasure codes, CORE uses 50% less bandwidth and is up to 75% faster while recovering a single failed node, while the gains are respectively 15% and 60% for double node failures.
Access libraries such as ROOT and HDF5 allow users to interact with datasets using high level abstractions, like coordinate systems and associated slicing operations. Unfortunately, the implementations of access libraries are based on outdated assump tions about storage systems interfaces and are generally unable to fully benefit from modern fast storage devices. The situation is getting worse with rapidly evolving storage devices such as non-volatile memory and ever larger datasets. This project explores distributed dataset mapping infrastructures that can integrate and scale out existing access libraries using Cephs extensible object model, avoiding re-implementation or even modifications of these access libraries as much as possible. These programmable storage extensions coupled with our distributed dataset mapping techniques enable: 1) access library operations to be offloaded to storage system servers, 2) the independent evolution of access libraries and storage systems and 3) fully leveraging of the existing load balancing, elasticity, and failure management of distributed storage systems like Ceph. They also create more opportunities to conduct storage server-local optimizations specific to storage servers. For example, storage servers might include local key/value stores combined with chunk stores that require different optimizations than a local file system. As storage servers evolve to support new storage devices like non-volatile memory, these server-local optimizations can be implemented while minimizing disruptions to applications. We will report progress on the means by which distributed dataset mapping can be abstracted over particular access libraries, including access libraries for ROOT data, and how we address some of the challenges revolving around data partitioning and composability of access operations.
One of the major performance and scalability bottlenecks in large scientific applications is parallel reading and writing to supercomputer I/O systems. The usage of parallel file systems and consistency requirements of POSIX, that all the traditional HPC parallel I/O interfaces adhere to, pose limitations to the scalability of scientific applications. Object storage is a widely used storage technology in cloud computing and is more frequently proposed for HPC workload to address and improve the current scalability and performance of I/O in scientific applications. While object storage is a promising technology, it is still unclear how scientific applications will use object storage and what the main performance benefits will be. This work addresses these questions, by emulating an object storage used by a traditional scientific application and evaluating potential performance benefits. We show that scientific applications can benefit from the usage of object storage on large scales.
In this paper we prove lower and matching upper bounds for the number of servers required to implement a regular shared register that tolerates unsynchronized Mobile Byzantine failures. We consider the strongest model of Mobile Byzantine failures to date: agents are moved arbitrarily by an omniscient adversary from a server to another in order to deviate their computation in an unforeseen manner. When a server is infected by an Byzantine agent, it behaves arbitrarily until the adversary decides to move the agent to another server. Previous approaches considered asynchronous servers with synchronous mobile Byzantine agents (yielding impossibility results), and synchronous servers with synchronous mobile Byzantine agents (yielding optimal solutions for regular register implementation, even in the case where servers and agents periods are decoupled). We consider the remaining open case of synchronous servers with unsynchronized agents, that can move at their own pace, and change their pace during the execution of the protocol. Most of our findings relate to lower bounds, and characterizing the model parameters that make the problem solvable. It turns out that unsynchronized mobile Byzantine agent movements requires completely new proof arguments, that can be of independent interest when studying other problems in this model. Additionally, we propose a generic server-based algorithm that emulates a regular register in this model, that is tight with respect to the number of mobile Byzantine agents that can be tolerated. Our emulation spans two awareness models: servers with and without self-diagnose mechanisms. In the first case servers are aware that the mobile Byzantine agent has left and hence they can stop running the protocol until they recover a correct state while in the second case, servers are not aware of their faulty state and continue to run the protocol using an incorrect local state.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا