ﻻ يوجد ملخص باللغة العربية
We introduce a framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, we consider a population of elements $[n]={1,ldots,n}$ with corresponding data values $x_1,ldots,x_n$. We observe the values for a sample set $A subset [n]$ and wish to estimate some statistic of the values for a target set $B subset [n]$ where $B$ could be the entire set. Crucially, we assume that the sets $A$ and $B$ are drawn according to some known distribution $P$ over pairs of subsets of $[n]$. A given estimation algorithm is evaluated based on its worst-case, expected error where the expectation is with respect to the distribution $P$ from which the sample $A$ and target sets $B$ are drawn, and the worst-case is with respect to the data values $x_1,ldots,x_n$. Within this framework, we give an efficient algorithm for estimating the target mean that returns a weighted combination of the sample values--where the weights are functions of the distribution $P$ and the sample and target sets $A$, $B$--and show that the worst-case expected error achieved by this algorithm is at most a multiplicative $pi/2$ factor worse than the optimal of such algorithms. The algorithm and proof leverage a surprising connection to the Grothendieck problem. This framework, which makes no distributional assumptions on the data values but rather relies on knowledge of the data collection process, is a significant departure from typical estimation and introduces a uniform algorithmic analysis for the many natural settings where membership in a sample may be correlated with data values, such as when sampling probabilities vary as in importance sampling, when individuals are recruited into a sample via a social network as in snowball sampling, or when samples have chronological structure as in selective prediction.
One of the primary goals of the mathematical analysis of algorithms is to provide guidance about which algorithm is the best for solving a given computational problem. Worst-case analysis summarizes the performance profile of an algorithm by its wors
This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with $n$ vertices and $m$ edges while supporting edge insertions and
In the dynamic minimum set cover problem, a challenge is to minimize the update time while guaranteeing close to the optimal $min(O(log n), f)$ approximation factor. (Throughout, $m$, $n$, $f$, and $C$ are parameters denoting the maximum number of se
In the early $20^{th}$ century, Pigou observed that imposing a marginal cost tax on the usage of a public good induces a socially efficient level of use as an equilibrium. Unfortunately, such a Pigouvian tax may also induce other, socially inefficien
Adaptive collection of data is commonplace in applications throughout science and engineering. From the point of view of statistical inference however, adaptive data collection induces memory and correlation in the samples, and poses significant chal