Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing


Abstract in English

Sample-based approximate query processing (AQP) suffers from many pitfalls such as the inability to answer very selective queries and unreliable confidence intervals when sample sizes are small. Recent research presented an intriguing solution of combining materialized, pre-computed aggregates with sampling for accurate and more reliable AQP. We explore this solution in detail in this work and propose an AQP physical design called PASS, or Precomputation-Assisted Stratified Sampling. PASS builds a tree of partial aggregates that cover different partitions of the dataset. The leaf nodes of this tree form the strata for stratified samples. Aggregate queries whose predicates align with the partitions (or unions of partitions) are exactly answered with a depth-first search, and any partial overlaps are approximated with the stratified samples. We propose an algorithm for optimally partitioning the data into such a data structure with various practical approximation techniques.

Download