Subscribe to the gold package and get unlimited access to Shamra Academy

LevelHeaded: Making Worst-Case Optimal Joins Work in the Common Case

74 0 0.0 ( 0 )

Download Cite

Added by Christopher Aberger

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Christopher R. Aberger - Andrew Lamb - Kunle Olukotun

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Pipelines combining SQL-style business intelligence (BI) queries and linear algebra (LA) are becoming increasingly common in industry. As a result, there is a growing need to unify these workloads in a single framework. Unfortunately, existing solutions either sacrifice the inherent benefits of exclusively using a relational database (e.g. logical and physical independence) or incur orders of magnitude performance gaps compared to specialized engines (or both). In this work we study applying a new type of query processing architecture to standard BI and LA benchmarks. To do this we present a new in-memory query processing engine called LevelHeaded. LevelHeaded uses worst-case optimal joins as its core execution mechanism for both BI and LA queries. With LevelHeaded, we show how crucial optimizations for BI and LA queries can be captured in a worst-case optimal query architecture. Using these optimizations, LevelHeaded outperforms other relational database engines (LogicBlox, MonetDB, and HyPer) by orders of magnitude on standard LA benchmarks, while performing on average within 31% of the best-of-breed BI (HyPer) and LA (Intel MKL) solutions on their own benchmarks. Our results show that such a single query processing architecture is capable of delivering competitive performance on both BI and LA queries.

rate research

Counting Triangles under Updates in Worst-Case Optimal Time

230 - Ahmet Kara , Hung Q. Ngo , Milos Nikolic 2018

We consider the problem of incrementally maintaining the triangle count query under single-tuple updates to the input relations. We introduce an approach that exhibits a space-time tradeoff such that the space-time product is quadratic in the size of the input database and the update time can be as low as the square root of this size. This lowest update time is worst-case optimal conditioned on the Online Matrix-Vector Multiplication conjecture. The classical and factorized incremental view maintenance approaches are recovered as special cases of our approach within the space-time tradeoff. In particular, they require linear-time update maintenance, which is suboptimal. Our approach also recovers the worst-case optimal time complexity for computing the triangle count in the non-incremental setting.

Databases

Entropic equality for worst-case work at any protocol speed

563 - Oscar Dahlsten , Mahn-Soo Choi , Daniel Braun 2015

We derive an equality for non-equilibrium statistical mechanics in finite-dimensional quantum systems. The equality concerns the worst-case work output of a time-dependent Hamiltonian protocol in the presence of a Markovian heat bath. It has has the form worst-case work = penalty - optimum. The equality holds for all rates of changing the Hamiltonian and can be used to derive the optimum by setting the penalty to 0. The optimum term contains the max entropy of the initial state, rather than the von Neumann entropy, thus recovering recent results from single-shot statistical mechanics. Energy coherences can arise during the protocol but are assumed not to be present initially. We apply the equality to an electron box.

Quantum Physics Statistical Mechanics

Worst-Case Optimal Covering of Rectangles by Disks

119 - Sandor P. Fekete , Utkarsh Gupta , Phillip Keldenich 2020

We provide the solution for a fundamental problem of geometric optimization by giving a complete characterization of worst-case optimal disk coverings of rectangles: For any $lambdageq 1$, the critical covering area $A^*(lambda)$ is the minimum value for which any set of disks with total area at least $A^*(lambda)$ can cover a rectangle of dimensions $lambdatimes 1$. We show that there is a threshold value $lambda_2 = sqrt{sqrt{7}/2 - 1/4} approx 1.035797ldots$, such that for $lambda<lambda_2$ the critical covering area $A^*(lambda)$ is $A^*(lambda)=3pileft(frac{lambda^2}{16} +frac{5}{32} + frac{9}{256lambda^2}right)$, and for $lambdageq lambda_2$, the critical area is $A^*(lambda)=pi(lambda^2+2)/4$; these values are tight. For the special case $lambda=1$, i.e., for covering a unit square, the critical covering area is $frac{195pi}{256}approx 2.39301ldots$. The proof uses a careful combination of manual and automatic analysis, demonstrating the power of the employed interval arithmetic technique.

Computational Geometry

Packing Disks into Disks with Optimal Worst-Case Density

57 - Sandor P. Fekete , Phillip Keldenich , Christian Scheffer 2019

We provide a tight result for a fundamental problem arising from packing disks into a circular container: The critical density of packing disks in a disk is 0.5. This implies that any set of (not necessarily equal) disks of total area $deltaleq 1/2$ can always be packed into a disk of area 1; on the other hand, for any $varepsilon>0$ there are sets of disks of area $1/2+varepsilon$ that cannot be packed. The proof uses a careful manual analysis, complemented by a minor automatic part that is based on interval arithmetic. Beyond the basic mathematical importance, our result is also useful as a blackbox lemma for the analysis of recursive packing algorithms.

Computational Geometry

Making Table Understanding Work in Practice

85 - Madelon Hulsebos , Sneha Gathani , James Gale 2021

Understanding the semantics of tables at scale is crucial for tasks like data integration, preparation, and search. Table understanding methods aim at detecting a tables topic, semantic column types, column relations, or entities. With the rise of deep learning, powerful models have been developed for these tasks with excellent accuracy on benchmarks. However, we observe that there exists a gap between the performance of these models on these benchmarks and their applicability in practice. In this paper, we address the question: what do we need for these models to work in practice? We discuss three challenges of deploying table understanding models and propose a framework to address them. These challenges include 1) difficulty in customizing models to specific domains, 2) lack of training data for typical database tables often found in enterprises, and 3) lack of confidence in the inferences made by models. We present SigmaTyper which implements this framework for the semantic column type detection task. SigmaTyper encapsulates a hybrid model trained on GitTables and integrates a lightweight human-in-the-loop approach to customize the model. Lastly, we highlight avenues for future research that further close the gap towards making table understanding effective in practice.

Databases Human-Computer Interaction Machine Learning

comments

Fetching comments

AlHawash Private University

Additional details More universities

LevelHeaded: Making Worst-Case Optimal Joins Work in the Common Case

Ask ChatGPT about the research

No Arabic abstract

Read More