RITA: An Index-Tuning Advisor for Replicated Databases

192 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Quoc Trung Tran

تاريخ النشر 2013

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Quoc Trung Tran - Ivo Jimenez - Rui Wang

قواعد البيانات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Given a replicated database, a divergent design tunes the indexes in each replica differently in order to specialize it for a specific subset of the workload. This specialization brings significant performance gains compared to the common practice of having the same indexes in all replicas, but requires the development of new tuning tools for database administrators. In this paper we introduce RITA (Replication-aware Index Tuning Advisor), a novel divergent-tuning advisor that offers several essential features not found in existing tools: it generates robust divergent designs that allow the system to adapt gracefully to replica failures; it computes designs that spread the load evenly among specialized replicas, both during normal operation and when replicas fail; it monitors the workload online in order to detect changes that require a recomputation of the divergent design; and, it offers suggestions to elastically reconfigure the system (by adding/removing replicas or adding/dropping indexes) to respond to workload changes. The key technical innovation behind RITA is showing that the problem of selecting an optimal design can be formulated as a Binary Integer Program (BIP). The BIP has a relatively small number of variables, which makes it feasible to solve it efficiently using any off-the-shelf linear-optimization software. Experimental results demonstrate that RITA computes better divergent designs compared to existing tools, offers more features, and has fast execution times.

قيم البحث

203 - Cristian Molinaro , Jan Chomicki , Jerzy Marcinkowski 2008

This paper addresses the problem of representing the set of repairs of a possibly inconsistent database by means of a disjunctive database. Specifically, the class of denial constraints is considered. We show that, given a database and a set of denia l constraints, there exists a (unique) disjunctive database, called canonical, which represents the repairs of the database w.r.t. the constraints and is contained in any other disjunctive database with the same set of minimal models. We propose an algorithm for computing the canonical disjunctive database. Finally, we study the size of the canonical disjunctive database in the presence of functional dependencies for both repairs and cardinality-based repairs.

قواعد البيانات

Scalable Blocking for Very Large Databases

163 - Andrew Borthwick , Stephen Ash , Bin Pang 2020

In the field of database deduplication, the goal is to find approximately matching records within a database. Blocking is a typical stage in this process that involves cheaply finding candidate pairs of records that are potential matches for further processing. We present here Hashed Dynamic Blocking, a new approach to blocking designed to address datasets larger than those studied in most prior work. Hashed Dynamic Blocking (HDB) extends Dynamic Blocking, which leverages the insight that rare matching values and rare intersections of values are predictive of a matching relationship. We also present a novel use of Locality Sensitive Hashing (LSH) to build blocking key values for huge databases with a convenient configuration to control the trade-off between precision and recall. HDB achieves massive scale by minimizing data movement, using compact block representation, and greedily pruning ineffective candidate blocks using a Count-min Sketch approximate counting data structure. We benchmark the algorithm by focusing on real-world datasets in excess of one million rows, demonstrating that the algorithm displays linear time complexity scaling in this range. Furthermore, we execute HDB on a 530 million row industrial dataset, detecting 68 billion candidate pairs in less than three hours at a cost of $307 on a major cloud service.

قواعد البيانات النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

An Analysis of Concurrency Control Protocols for In-Memory Databases with CCBench (Extended Version)

107 - Takayuki Tanabe , Takashi Hoshino , Hideyuki Kawashima 2020

This paper presents yet another concurrency control analysis platform, CCBench. CCBench supports seven protocols (Silo, TicToc, MOCC, Cicada, SI, SI with latch-free SSN, 2PL) and seven versatile optimization methods and enables the configuration of s even workload parameters. We analyzed the protocols and optimization methods using various workload parameters and a thread count of 224. Previous studies focused on thread scalability and did not explore the space analyzed here. We classified the optimization methods on the basis of three performance factors: CPU cache, delay on conflict, and version lifetime. Analyses using CCBench and 224 threads, produced six insights. (I1) The performance of optimistic concurrency control protocol for a read only workload rapidly degrades as cardinality increases even without L3 cache misses. (I2) Silo can outperform TicToc for some write-intensive workloads by using invisible reads optimization. (I3) The effectiveness of two approaches to coping with conflict (wait and no-wait) depends on the situation. (I4) OCC reads the same record two or more times if a concurrent transaction interruption occurs, which can improve performance. (I5) Mixing different implementations is inappropriate for deep analysis. (I6) Even a state-of-the-art garbage collection method cannot improve the performance of multi-version protocols if there is a single long transaction mixed into the workload. On the basis of I4, we defined the read phase extension optimization in which an artificial delay is added to the read phase. On the basis of I6, we defined the aggressive garbage collection optimization in which even visibl

قواعد البيانات

Personal Information Databases

237 - Sabah S. Al-Fedaghi , Bernhard Thalheim 2009

One of the most important aspects of security organization is to establish a framework to identify security significant points where policies and procedures are declared. The (information) security infrastructure comprises entities, processes, and te chnology. All are participants in handling information, which is the item that needs to be protected. Privacy and security information technology is a critical and unmet need in the management of personal information. This paper proposes concepts and technologies for management of personal information. Two different types of information can be distinguished: personal information and nonpersonal information. Personal information can be either personal identifiable information (PII), or nonidentifiable information (NII). Security, policy, and technical requirements can be based on this distinction. At the conceptual level, PII is defined and formalized by propositions over infons (discrete pieces of information) that specify transformations in PII and NII. PII is categorized into simple infons that reflect the proprietor s aspects, relationships with objects, and relationships with other proprietors. The proprietor is the identified person about whom the information is communicated. The paper proposes a database organization that focuses on the PII spheres of proprietors. At the design level, the paper describes databases of personal identifiable information built exclusively for this type of information, with their own conceptual scheme, system management, and physical structure.

قواعد البيانات

Adaptive Logging for Distributed In-memory Databases

292 - Chang Yao , Divyakant Agrawal , Gang Chen 2015

A new type of logs, the command log, is being employed to replace the traditional data log (e.g., ARIES log) in the in-memory databases. Instead of recording how the tuples are updated, a command log only tracks the transactions being executed, there by effectively reducing the size of the log and improving the performance. Command logging on the other hand increases the cost of recovery, because all the transactions in the log after the last checkpoint must be completely redone in case of a failure. In this paper, we first extend the command logging technique to a distributed environment, where all the nodes can perform recovery in parallel. We then propose an adaptive logging approach by combining data logging and command logging. The percentage of data logging versus command logging becomes an optimization between the performance of transaction processing and recovery to suit different OLTP applications. Our experimental study compares the performance of our proposed adaptive logging, ARIES-style data logging and command logging on top of H-Store. The results show that adaptive logging can achieve a 10x boost for recovery and a transaction throughput that is comparable to that of command logging.

قواعد البيانات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشھباء الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

RITA: An Index-Tuning Advisor for Replicated Databases

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً