No Arabic abstract
We study the role of interactivity in distributed statistical inference under information constraints, e.g., communication constraints and local differential privacy. We focus on the tasks of goodness-of-fit testing and estimation of discrete distributions. From prior work, these tasks are well understood under noninteractive protocols. Extending these approaches directly for interactive protocols is difficult due to correlations that can build due to interactivity; in fact, gaps can be found in prior claims of tight bounds of distribution estimation using interactive protocols. We propose a new approach to handle this correlation and establish a unified method to establish lower bounds for both tasks. As an application, we obtain optimal bounds for both estimation and testing under local differential privacy and communication constraints. We also provide an example of a natural testing problem where interactivity helps.
MAXCUT defines a classical NP-hard problem for graph partitioning and it serves as a typical case of the symmetric non-monotone Unconstrained Submodular Maximization (USM) problem. Applications of MAXCUT are abundant in machine learning, computer vision and statistical physics. Greedy algorithms to approximately solve MAXCUT rely on greedy vertex labelling or on an edge contraction strategy. These algorithms have been studied by measuring their approximation ratios in the worst case setting but very little is known to characterize their robustness to noise contaminations of the input data in the average case. Adapting the framework of Approximation Set Coding, we present a method to exactly measure the cardinality of the algorithmic approximation sets of five greedy MAXCUT algorithms. Their information contents are explored for graph instances generated by two different noise models: the edge reversal model and Gaussian edge weights model. The results provide insights into the robustness of different greedy heuristics and techniques for MAXCUT, which can be used for algorithm design of general USM problems.
In wireless sensor networks (WSNs), the Eschenauer-Gligor (EG) key pre-distribution scheme is a widely recognized way to secure communications. Although connectivity properties of secure WSNs with the EG scheme have been extensively investigated, few results address physical transmission constraints. These constraints reflect real-world implementations of WSNs in which two sensors have to be within a certain distance from each other to communicate. In this paper, we present zero-one laws for connectivity in WSNs employing the EG scheme under transmission constraints. These laws help specify the critical transmission ranges for connectivity. Our analytical findings are confirmed via numerical experiments. In addition to secure WSNs, our theoretical results are also applied to frequency hopping in wireless networks.
Given a separation oracle $mathsf{SO}$ for a convex function $f$ that has an integral minimizer inside a box with radius $R$, we show how to find an exact minimizer of $f$ using at most (a) $O(n (n + log(R)))$ calls to $mathsf{SO}$ and $mathsf{poly}(n, log(R))$ arithmetic operations, or (b) $O(n log(nR))$ calls to $mathsf{SO}$ and $exp(n) cdot mathsf{poly}(log(R))$ arithmetic operations. When the set of minimizers of $f$ has integral extreme points, our algorithm outputs an integral minimizer of $f$. This improves upon the previously best oracle complexity of $O(n^2 (n + log(R)))$ for polynomial time algorithms obtained by [Grotschel, Lovasz and Schrijver, Prog. Comb. Opt. 1984, Springer 1988] over thirty years ago. For the Submodular Function Minimization problem, our result immediately implies a strongly polynomial algorithm that makes at most $O(n^3)$ calls to an evaluation oracle, and an exponential time algorithm that makes at most $O(n^2 log(n))$ calls to an evaluation oracle. These improve upon the previously best $O(n^3 log^2(n))$ oracle complexity for strongly polynomial algorithms given in [Lee, Sidford and Wong, FOCS 2015] and [Dadush, Vegh and Zambelli, SODA 2018], and an exponential time algorithm with oracle complexity $O(n^3 log(n))$ given in the former work. Our result is achieved via a reduction to the Shortest Vector Problem in lattices. We show how an approximately shortest vector of certain lattice can be used to effectively reduce the dimension of the problem. Our analysis of the oracle complexity is based on a potential function that captures simultaneously the size of the search set and the density of the lattice, which we analyze via technical tools from convex geometry.
In modern settings of data analysis, we may be running our algorithms on datasets that are sensitive in nature. However, classical machine learning and statistical algorithms were not designed with these risks in mind, and it has been demonstrated that they may reveal personal information. These concerns disincentivize individuals from providing their data, or even worse, encouraging intentionally providing fake data. To assuage these concerns, we import the constraint of differential privacy to the statistical inference, considered by many to be the gold standard of data privacy. This thesis aims to quantify the cost of ensuring differential privacy, i.e., understanding how much additional data is required to perform data analysis with the constraint of differential privacy. Despite the maturity of the literature on differential privacy, there is still inadequate understanding in some of the most fundamental settings. In particular, we make progress in the following problems: $bullet$ What is the sample complexity of DP hypothesis testing? $bullet$ Can we privately estimate distribution properties with a negligible cost? $bullet$ What is the fundamental limit in private distribution estimation? $bullet$ How can we design algorithms to privately estimate random graphs? $bullet$ What is the trade-off between the sample complexity and the interactivity in private hypothesis selection?
We study the assignment problem of objects to agents with heterogeneous preferences under distributional constraints. Each agent is associated with a publicly known type and has a private ordinal ranking over objects. We are interested in assigning as many agents as possible. Our first contribution is a generalization of the well-known and widely used serial dictatorship. Our mechanism maintains several desirable properties of serial dictatorship, including strategyproofness, Pareto efficiency, and computational tractability while satisfying the distributional constraints with a small error. We also propose a generalization of the probabilistic serial algorithm, which finds an ordinally efficient and envy-free assignment, and also satisfies the distributional constraints with a small error. We show, however, that no ordinally efficient and envy-free mechanism is also weakly strategyproof. Both of our algorithms assign at least the same number of students as the optimum fractional assignment.