ﻻ يوجد ملخص باللغة العربية
Given a dataset $mathcal{D}$, we are interested in computing the mean of a subset of $mathcal{D}$ which matches a predicate. ABae leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget $N$. In this document, we theoretically analyze ABae and show that the MSE of the estimate decays at rate $O(N_1^{-1} + N_2^{-1} + N_1^{1/2}N_2^{-3/2})$, where $N=K cdot N_1+N_2$ for some integer constant $K$ and $K cdot N_1$ and $N_2$ represent the number of samples used in Stage 1 and Stage 2 of ABae respectively. Hence, if a constant fraction of the total sample budget $N$ is allocated to each stage, we will achieve a mean squared error of $O(N^{-1})$ which matches the rate of mean squared error of the optimal stratified sampling algorithm given a priori knowledge of the predicate positive rate and standard deviation per stratum.
Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deep neural networks (DNNs). As these DNNs are expensive and
We consider the problem of evaluating certain types of functional aggregation queries on relational data subject to additive inequalities. Such aggregation queries, with a smallish number of additive inequalities, arise naturally/commonly in many app
We introduce and study several notions of approximation for ontology-mediated queries based on the description logics ALC and ALCI. Our approximations are of two kinds: we may (1) replace the ontology with one formulated in a tractable ontology langu
Given a graph $G$, a source node $s$ and a target node $t$, the personalized PageRank (PPR) of $t$ with respect to $s$ is the probability that a random walk starting from $s$ terminates at $t$. An important variant of the PPR query is single-source P
Modern database systems are growing increasingly distributed and struggle to reduce query completion time with a large volume of data. In this paper, we leverage programmable switches in the network to partially offload query computation to the switc