We bound the number of nearly orthogonal vectors with fixed VC-dimension over $setpm^n$. Our bounds are of interest in machine learning and empirical process theory and improve previous bounds by Haussler. The bounds are based on a simple projection
argument and the generalize to other product spaces. Along the way we derive tight bounds on the sum of binomial coefficients in terms of the entropy function.
Partition functions arise in statistical physics and probability theory as the normalizing constant of Gibbs measures and in combinatorics and graph theory as graph polynomials. For instance the partition functions of the hard-core model and monomer-
dimer model are the independence and matching polynomials respectively. We show how stability results follow naturally from the recently developed occupancy method for maximizing and minimizing physical observables over classes of regular graphs, and then show these stability results can be used to obtain tight extremal bounds on the individual coefficients of the corresponding partition functions. As applications, we prove new bounds on the number of independent sets and matchings of a given size in regular graphs. For large enough graphs and almost all sizes, the bounds are tight and confirm the Upper Matching Conjecture of Friedland, Krop, and Markstrom and a conjecture of Kahn on independent sets for a wide range of parameters. Additionally we prove tight bounds on the number of $q$-colorings of cubic graphs with a given number of monochromatic edges, and tight bounds on the number of independent sets of a given size in cubic graphs of girth at least $5$.
Random uniform sampling has been studied in various statistical tasks but few of them have covered the Q-error metric for cardinality estimation (CE). In this paper, we analyze the confidence intervals of random uniform sampling with and without repl
acement for single-table CE. Results indicate that the upper Q-error bound depends on the sample size and true cardinality. Our bound gives a rule-of-thumb for how large a sample should be kept for single-table CE.
We study the optimization version of the equal cardinality set partition problem (where the absolute difference between the equal sized partitions sums are minimized). While this problem is NP-hard and requires exponential complexity to solve in gene
ral, we have formulated a weaker version of this NP-hard problem, where the goal is to find a locally optimal solution. The local optimality considered in our work is under any swap between the opposing partitions element pairs. To this end, we designed an algorithm which can produce such a locally optimal solution in $O(N^2)$ time and $O(N)$ space. Our approach does not require positive or integer inputs and works equally well under arbitrary input precisions. Thus, it is widely applicable in different problem scenarios.
Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under
the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inherent limitations of these optimization problems. In this paper, with certain restrictions on the communication allowed in the procedures, we develop tight lower bounds on communication rounds for a broad class of non-incremental algorithms under this setting. We also provide a lower bound on communication rounds for a class of (randomized) incremental algorithms.