A Comparative Study of Consistent Snapshot Algorithms for Main-Memory Database Systems

93 0 0.0 ( 0 )

Download Cite

Added by Liang Li

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Liang Li - Guoren Wang - Gang Wu

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In-memory databases (IMDBs) are gaining increasing popularity in big data applications, where clients commit updates intensively. Specifically, it is necessary for IMDBs to have efficient snapshot performance to support certain special applications (e.g., consistent checkpoint, HTAP). Formally, the in-memory consistent snapshot problem refers to taking an in-memory consistent time-in-point snapshot with the constraints that 1) clients can read the latest data items and 2) any data item in the snapshot should not be overwritten. Various snapshot algorithms have been proposed in academia to trade off throughput and latency, but industrial IMDBs such as Redis adhere to the simple fork algorithm. To understand this phenomenon, we conduct comprehensive performance evaluations on mainstream snapshot algorithms. Surprisingly, we observe that the simple fork algorithm indeed outperforms the state-of-the-arts in update-intensive workload scenarios. On this basis, we identify the drawbacks of existing research and propose two lightweight improvements. Extensive evaluations on synthetic data and Redis show that our lightweight improvements yield better performance than fork, the current industrial standard, and the representative snapshot algorithms from academia. Finally, we have opensourced the implementation of all the above snapshot algorithms so that practitioners are able to benchmark the performance of each algorithm and select proper methods for different application scenarios.

rate research

Cracking In-Memory Database Index A Case Study for Adaptive Radix Tree Index

57 - Gang Wu , Yidong Song , Guodong Zhao 2019

Indexes provide a method to access data in databases quickly. It can improve the response speed of subsequent queries by building a complete index in advance. However, it also leads to a huge overhead of the continuous updating during creating the index. An in-memory database usually has a higher query processing performance than disk databases and is more suitable for real-time query processing. Therefore, there is an urgent need to reduce the index creation and update cost for in-memory databases. Database cracking technology is currently recognized as an effective method to reduce the index initialization time. However, conventional cracking algorithms are focused on simple column data structure rather than those complex index structure for in-memory databases. In order to show the feasibility of in-memory database index cracking and promote to future more extensive research, this paper conducted a case study on the Adaptive Radix Tree (ART), a popular tree index structure of in-memory databases. On the basis of carefully examining the ART index construction overhead, an algorithm using auxiliary data structures to crack the ART index is proposed.

Databases

Greedy Search Algorithms for Unsupervised Variable Selection: A Comparative Study

81 - Federico Zocco , Marco Maggipinto , Gian Antonio Susto 2021

Dimensionality reduction is a important step in the development of scalable and interpretable data-driven models, especially when there are a large number of candidate variables. This paper focuses on unsupervised variable selection based dimensionality reduction, and in particular on unsupervised greedy selection methods, which have been proposed by various researchers as computationally tractable approximations to optimal subset selection. These methods are largely distinguished from each other by the selection criterion adopted, which include squared correlation, variance explained, mutual information and frame potential. Motivated by the absence in the literature of a systematic comparison of these different methods, we present a critical evaluation of seven unsupervised greedy variable selection algorithms considering both simulated and real world case studies. We also review the theoretical results that provide performance guarantees and enable efficient implementations for certain classes of greedy selection function, related to the concept of submodularity. Furthermore, we introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm. Our experimental results show that: (1) variance explained and mutual information based selection methods yield smaller approximation errors than frame potential; (2) the lazy FSCA implementation has similar performance to FSCA, while being an order of magnitude faster to compute, making it the algorithm of choice for unsupervised variable selection.

Machine Learning

A Novel approach for Hybrid Database

442 - Rajesh Kumar Tiwari 2013

In the current world of economic crises, the cost control is one of the chief concerns for all types of industries, especially for the small venders. The small vendors are suppose to minimize their budget on Information Technology by reducing the initial investment in hardware and costly database servers like ORACLE, SQL Server, SYBASE, etc. for the purpose of data processing and storing. In other divisions, the electronic devices manufacturing companies want to increase the demand and reduce the manufacturing cost by introducing the low cost technologies. The new small devices like ipods, iphones, palm top etc. are now-a-days used as data computation and storing tools. For both the cases mentioned above, instead of going for the costly database servers which additionally requires extra hardware as well as the extra expenses in training and handling, the flat file may be considered as a candidate due to its easy handling nature, fast accessing, and of course free of cost. But the main hurdle is the security aspects which are not up to the optimum level. In this paper, we propose a methodology that combines all the merit of the flat file and with the help of a novel steganographic technique we can maintain the utmost security fence. The new proposed methodology will undoubtedly be highly beneficial for small vendors as well as for the above said electronic devices manufacturer

Databases Cryptography and Security Multimedia

Reenactment for Read-Committed Snapshot Isolation

56 - Bahareh Sadat Arab , Dieter Gawlick , Vasudha Krishnaswamy 2016

Databases

A new Watermarking Technique for Secure Database

504 - Jun Ziang Pinn , A. Fr. Zung 2013

Digital multimedia watermarking technology was suggested in the last decade to embed copyright information in digital objects such images, audio and video. However, the increasing use of relational database systems in many real-life applications created an ever increasing need for watermarking database systems. As a result, watermarking relational database systems is now merging as a research area that deals with the legal issue of copyright protection of database systems. Approach: In this study, we proposed an efficient database watermarking algorithm based on inserting binary image watermarks in non-numeric mutli-word attributes of selected database tuples. Results: The algorithm is robust as it resists attempts to remove or degrade the embedded watermark and it is blind as it does not require the original database in order to extract the embedded watermark. Conclusion: Experimental results demonstrated blindness and the robustness of the algorithm against common database attacks.

Databases Cryptography and Security Multimedia