No Arabic abstract
We introduce a framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, we consider a population of elements $[n]={1,ldots,n}$ with corresponding data values $x_1,ldots,x_n$. We observe the values for a sample set $A subset [n]$ and wish to estimate some statistic of the values for a target set $B subset [n]$ where $B$ could be the entire set. Crucially, we assume that the sets $A$ and $B$ are drawn according to some known distribution $P$ over pairs of subsets of $[n]$. A given estimation algorithm is evaluated based on its worst-case, expected error where the expectation is with respect to the distribution $P$ from which the sample $A$ and target sets $B$ are drawn, and the worst-case is with respect to the data values $x_1,ldots,x_n$. Within this framework, we give an efficient algorithm for estimating the target mean that returns a weighted combination of the sample values--where the weights are functions of the distribution $P$ and the sample and target sets $A$, $B$--and show that the worst-case expected error achieved by this algorithm is at most a multiplicative $pi/2$ factor worse than the optimal of such algorithms. The algorithm and proof leverage a surprising connection to the Grothendieck problem. This framework, which makes no distributional assumptions on the data values but rather relies on knowledge of the data collection process, is a significant departure from typical estimation and introduces a uniform algorithmic analysis for the many natural settings where membership in a sample may be correlated with data values, such as when sampling probabilities vary as in importance sampling, when individuals are recruited into a sample via a social network as in snowball sampling, or when samples have chronological structure as in selective prediction.
One of the primary goals of the mathematical analysis of algorithms is to provide guidance about which algorithm is the best for solving a given computational problem. Worst-case analysis summarizes the performance profile of an algorithm by its worst performance on any input of a given size, implicitly advocating for the algorithm with the best-possible worst-case performance. Strong worst-case guarantees are the holy grail of algorithm design, providing an application-agnostic certification of an algorithms robustly good performance. However, for many fundamental problems and performance measures, such guarantees are impossible and a more nuanced analysis approach is called for. This chapter surveys several alternatives to worst-case analysis that are discussed in detail later in the book.
This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with $n$ vertices and $m$ edges while supporting edge insertions and deletions. We show that one can solve the dynamic MSF problem using $O(sqrt n)$ processors and $O(log n)$ worst-case update time, for a total of $O(sqrt n log n)$ work. This improves on the work of Ferragina [IPPS 1995] which costs $O(log n)$ worst-case update time and $O(n^{2/3} log{frac{m}{n}})$ work.
In the dynamic minimum set cover problem, a challenge is to minimize the update time while guaranteeing close to the optimal $min(O(log n), f)$ approximation factor. (Throughout, $m$, $n$, $f$, and $C$ are parameters denoting the maximum number of sets, number of elements, frequency, and the cost range.) In the high-frequency range, when $f=Omega(log n)$, this was achieved by a deterministic $O(log n)$-approximation algorithm with $O(f log n)$ amortized update time [Gupta et al. STOC17]. In the low-frequency range, the line of work by Gupta et al. [STOC17], Abboud et al. [STOC19], and Bhattacharya et al. [ICALP15, IPCO17, FOCS19] led to a deterministic $(1+epsilon)f$-approximation algorithm with $O(f log (Cn)/epsilon^2)$ amortized update time. In this paper we improve the latter update time and provide the first bounds that subsume (and sometimes improve) the state-of-the-art dynamic vertex cover algorithms. We obtain: 1. $(1+epsilon)f$-approximation ratio in $O(flog^2 (Cn)/epsilon^3)$ worst-case update time: No non-trivial worst-case update time was previously known for dynamic set cover. Our bound subsumes and improves by a logarithmic factor the $O(log^3 n/text{poly}(epsilon))$ worst-case update time for unweighted dynamic vertex cover (i.e., when $f=2$ and $C=1$) by Bhattacharya et al. [SODA17]. 2. $(1+epsilon)f$-approximation ratio in $Oleft((f^2/epsilon^3)+(f/epsilon^2) log Cright)$ amortized update time: This result improves the previous $O(f log (Cn)/epsilon^2)$ update time bound for most values of $f$ in the low-frequency range, i.e. whenever $f=o(log n)$. It is the first that is independent of $m$ and $n$. It subsumes the constant amortized update time of Bhattacharya and Kulkarni [SODA19] for unweighted dynamic vertex cover (i.e., when $f = 2$ and $C = 1$).
In the early $20^{th}$ century, Pigou observed that imposing a marginal cost tax on the usage of a public good induces a socially efficient level of use as an equilibrium. Unfortunately, such a Pigouvian tax may also induce other, socially inefficient, equilibria. We observe that this social inefficiency may be unbounded, and study whether alternative tax structures may lead to milder losses in the worst case, i.e. to a lower price of anarchy. We show that no tax structure leads to bounded losses in the worst case. However, we do find a tax scheme that has a lower price of anarchy than the Pigouvian tax, obtaining tight lower and upper bounds in terms of a crucial parameter that we identify. We generalize our results to various scenarios that each offers an alternative to the use of a public road by private cars, such as ride sharing, or using a bus or a train.
Adaptive collection of data is commonplace in applications throughout science and engineering. From the point of view of statistical inference however, adaptive data collection induces memory and correlation in the samples, and poses significant challenge. We consider the high-dimensional linear regression, where the samples are collected adaptively, and the sample size $n$ can be smaller than $p$, the number of covariates. In this setting, there are two distinct sources of bias: the first due to regularization imposed for consistent estimation, e.g. using the LASSO, and the second due to adaptivity in collecting the samples. We propose online debiasing, a general procedure for estimators such as the LASSO, which addresses both sources of bias. In two concrete contexts $(i)$ time series analysis and $(ii)$ batched data collection, we demonstrate that online debiasing optimally debiases the LASSO estimate when the underlying parameter $theta_0$ has sparsity of order $o(sqrt{n}/log p)$. In this regime, the debiased estimator can be used to compute $p$-values and confidence intervals of optimal size.