No Arabic abstract
We initiate the study of a quantity that we call coordination complexity. In a distributed optimization problem, the information defining a problem instance is distributed among $n$ parties, who need to each choose an action, which jointly will form a solution to the optimization problem. The coordination complexity represents the minimal amount of information that a centralized coordinator, who has full knowledge of the problem instance, needs to broadcast in order to coordinate the $n$ parties to play a nearly optimal solution. We show that upper bounds on the coordination complexity of a problem imply the existence of good jointly differentially private algorithms for solving that problem, which in turn are known to upper bound the price of anarchy in certain games with dynamically changing populations. We show several results. We fully characterize the coordination complexity for the problem of computing a many-to-one matching in a bipartite graph by giving almost matching lower and upper bounds.Our upper bound in fact extends much more generally, to the problem of solving a linearly separable convex program. We also give a different upper bound technique, which we use to bound the coordination complexity of coordinating a Nash equilibrium in a routing game, and of computing a stable matching.
We initiate the study of computing (near-)optimal contracts in succinctly representable principal-agent settings. Here optimality means maximizing the principals expected payoff over all incentive-compatible contracts---known in economics as second-best solutions. We also study a natural relaxation to approximately incentive-compatible contracts. We focus on principal-agent settings with succinctly described (and exponentially large) outcome spaces. We show that the computational complexity of computing a near-optimal contract depends fundamentally on the number of agent actions. For settings with a constant number of actions, we present a fully polynomial-time approximation scheme (FPTAS) for the separation oracle of the dual of the problem of minimizing the principals payment to the agent, and use this subroutine to efficiently compute a delta-incentive-compatible (delta-IC) contract whose expected payoff matches or surpasses that of the optimal IC contract. With an arbitrary number of actions, we prove that the problem is hard to approximate within any constant c. This inapproximability result holds even for delta-IC contracts where delta is a sufficiently rapidly-decaying function of c. On the positive side, we show that simple linear delta-IC contracts with constant delta are sufficient to achieve a constant-factor approximation of the first-best (full-welfare-extracting) solution, and that such a contract can be computed in polynomial time.
We study the role of interactivity in distributed statistical inference under information constraints, e.g., communication constraints and local differential privacy. We focus on the tasks of goodness-of-fit testing and estimation of discrete distributions. From prior work, these tasks are well understood under noninteractive protocols. Extending these approaches directly for interactive protocols is difficult due to correlations that can build due to interactivity; in fact, gaps can be found in prior claims of tight bounds of distribution estimation using interactive protocols. We propose a new approach to handle this correlation and establish a unified method to establish lower bounds for both tasks. As an application, we obtain optimal bounds for both estimation and testing under local differential privacy and communication constraints. We also provide an example of a natural testing problem where interactivity helps.
The Chamberlin-Courant and Monroe rules are fundamental and well-studied rules in the literature of multi-winner elections. The problem of determining if there exists a committee of size k that has a Chamberlin-Courant (respectively, Monroe) score of at most r is known to be NP-complete. We consider the following natural problems in this setting: a) given a committee S of size k as input, is it an optimal k-sized committee, and b) given a candidate c and a committee size k, does there exist an optimal k-sized committee that contains c? In this work, we resolve the complexity of both problems for the Chamberlin-Courant and Monroe voting rules in the settings of rankings as well as approval ballots. We show that verifying if a given committee is optimal is coNP-complete whilst the latter problem is complete for $Theta_{2}^{P}$. We also demonstrate efficient algorithms for the second problem when the input consists of single-peaked rankings. Our contribution fills an essential gap in the literature for these important multi-winner rules.
Suppose, we are given a set of $n$ elements to be clustered into $k$ (unknown) clusters, and an oracle/expert labeler that can interactively answer pair-wise queries of the form, do two elements $u$ and $v$ belong to the same cluster?. The goal is to recover the optimum clustering by asking the minimum number of queries. In this paper, we initiate a rigorous theoretical study of this basic problem of query complexity of interactive clustering, and provide strong information theoretic lower bounds, as well as nearly matching upper bounds. Most clustering problems come with a similarity matrix, which is used by an automated process to cluster similar points together. Our main contribution in this paper is to show the dramatic power of side information aka similarity matrix on reducing the query complexity of clustering. A similarity matrix represents noisy pair-wise relationships such as one computed by some function on attributes of the elements. A natural noisy model is where similarity values are drawn independently from some arbitrary probability distribution $f_+$ when the underlying pair of elements belong to the same cluster, and from some $f_-$ otherwise. We show that given such a similarity matrix, the query complexity reduces drastically from $Theta(nk)$ (no similarity matrix) to $O(frac{k^2log{n}}{cH^2(f_+|f_-)})$ where $cH^2$ denotes the squared Hellinger divergence. Moreover, this is also information-theoretic optimal within an $O(log{n})$ factor. Our algorithms are all efficient, and parameter free, i.e., they work without any knowledge of $k, f_+$ and $f_-$, and only depend logarithmically with $n$. Along the way, our work also reveals intriguing connection to popular community detection models such as the {em stochastic block model}, significantly generalizes them, and opens up many venues for interesting future research.
MAXCUT defines a classical NP-hard problem for graph partitioning and it serves as a typical case of the symmetric non-monotone Unconstrained Submodular Maximization (USM) problem. Applications of MAXCUT are abundant in machine learning, computer vision and statistical physics. Greedy algorithms to approximately solve MAXCUT rely on greedy vertex labelling or on an edge contraction strategy. These algorithms have been studied by measuring their approximation ratios in the worst case setting but very little is known to characterize their robustness to noise contaminations of the input data in the average case. Adapting the framework of Approximation Set Coding, we present a method to exactly measure the cardinality of the algorithmic approximation sets of five greedy MAXCUT algorithms. Their information contents are explored for graph instances generated by two different noise models: the edge reversal model and Gaussian edge weights model. The results provide insights into the robustness of different greedy heuristics and techniques for MAXCUT, which can be used for algorithm design of general USM problems.