ترغب بنشر مسار تعليمي؟ اضغط هنا

Blockchains vs. Distributed Databases: Dichotomy and Fusion

166   0   0.0 ( 0 )
 نشر من قبل Pingcheng Ruan
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Blockchain has come a long way: a system that was initially proposed specifically for cryptocurrencies is now being adapted and adopted as a general-purpose transactional system. As blockchain evolves into another data management system, the natural question is how it compares against distributed database systems. Existing works on this comparison focus on high-level properties, such as security and throughput. They stop short of showing how the underlying design choices contribute to the overall differences. Our work fills this important gap and provides a principled framework for analyzing the emerging trend of blockchain-database fusion. We perform a twin study of blockchains and distributed database systems as two types of transactional systems. We propose a taxonomy that illustrates the dichotomy across four dimensions, namely replication, concurrency, storage, and sharding. Within each dimension, we discuss how the design choices are driven by two goals: security for blockchains, and performance for distributed databases. To expose the impact of different design choices on the overall performance, we conduct an in-depth performance analysis of two blockchains, namely Quorum and Hyperledger Fabric, and two distributed databases, namely TiDB, and etcd. Lastly, we propose a framework for back-of-the-envelope performance forecast of blockchain-database hybrids.



قيم البحث

اقرأ أيضاً

A new type of logs, the command log, is being employed to replace the traditional data log (e.g., ARIES log) in the in-memory databases. Instead of recording how the tuples are updated, a command log only tracks the transactions being executed, there by effectively reducing the size of the log and improving the performance. Command logging on the other hand increases the cost of recovery, because all the transactions in the log after the last checkpoint must be completely redone in case of a failure. In this paper, we first extend the command logging technique to a distributed environment, where all the nodes can perform recovery in parallel. We then propose an adaptive logging approach by combining data logging and command logging. The percentage of data logging versus command logging becomes an optimization between the performance of transaction processing and recovery to suit different OLTP applications. Our experimental study compares the performance of our proposed adaptive logging, ARIES-style data logging and command logging on top of H-Store. The results show that adaptive logging can achieve a 10x boost for recovery and a transaction throughput that is comparable to that of command logging.
In this paper, we propose the DN-tree that is a data structure to build lossy summaries of the frequent data access patterns of the queries in a distributed graph data management system. These compact representations allow us an efficient communicati on of the data structure in distributed systems. We exploit this data structure with a new textit{Dynamic Data Partitioning} strategy (DYDAP) that assigns the portions of the graph according to historical data access patterns, and guarantees a small network communication and a computational load balance in distributed graph queries. This method is able to adapt dynamically to new workloads and evolve when the query distribution changes. Our experiments show that DYDAP yields a throughput up to an order of magnitude higher than previous methods based on cache specialization, in a variety of scenarios, and the average response time of the system is divided by two.
A benchmark study of modern distributed databases is an important source of information to select the right technology for managing data in the cloud-edge paradigms. To make the right decision, it is required to conduct an extensive experimental stud y on a variety of hardware infrastructures. While most of the state-of-the-art studies have investigated only response time and scalability of distributed databases, focusing on other various metrics (e.g., energy, bandwidth, and storage consumption) is essential to fully understand the resources consumption of the distributed databases. Also, existing studies have explored the response time and scalability of these databases either in private or public cloud. Hence, there is a paucity of investigation into the evaluation of these databases deployed in a hybrid cloud, which is the seamless integration of public and private cloud. To address these research gaps, in this paper, we investigate energy, bandwidth and storage consumption of the most used and common distributed databases. For this purpose, we have evaluated four open-source databases (Cassandra, Mongo, Redis and MySQL) on the hybrid cloud spanning over local OpenStack and Microsoft Azure, and a variety of edge computing nodes including Raspberry Pi, a cluster of Raspberry Pi, and low and high power servers. Our extensive experimental results reveal several helpful insights for the deployment selection of modern distributed databases in edge-cloud environments.
One of the most important aspects of security organization is to establish a framework to identify security significant points where policies and procedures are declared. The (information) security infrastructure comprises entities, processes, and te chnology. All are participants in handling information, which is the item that needs to be protected. Privacy and security information technology is a critical and unmet need in the management of personal information. This paper proposes concepts and technologies for management of personal information. Two different types of information can be distinguished: personal information and nonpersonal information. Personal information can be either personal identifiable information (PII), or nonidentifiable information (NII). Security, policy, and technical requirements can be based on this distinction. At the conceptual level, PII is defined and formalized by propositions over infons (discrete pieces of information) that specify transformations in PII and NII. PII is categorized into simple infons that reflect the proprietor s aspects, relationships with objects, and relationships with other proprietors. The proprietor is the identified person about whom the information is communicated. The paper proposes a database organization that focuses on the PII spheres of proprietors. At the design level, the paper describes databases of personal identifiable information built exclusively for this type of information, with their own conceptual scheme, system management, and physical structure.
Probabilistic databases play a crucial role in the management and understanding of uncertain data. However, incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or restrict the class of relational algebra formula under which they are closed. We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. Our approach allows the efficient evaluation of arbitrary queries over probabilistic databases with arbitrary dependencies expressed by graphical models with structure that changes during inference. MCMC sampling provides efficiency by hypothesizing {em modifications} to possible worlds rather than generating entire worlds from scratch. Queries are then run over the portions of the world that change, avoiding the onerous cost of running full queries over each sampled world. A significant innovation of this work is the connection between MCMC sampling and materialized view maintenance techniques: we find empirically that using view maintenance techniques is several orders of magnitude faster than naively querying each sampled world. We also demonstrate our systems ability to answer relational queries with aggregation, and demonstrate additional scalability through the use of parallelization.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا