No Arabic abstract
In standard rounding, we want to map each value $X$ in a large continuous space (e.g., $R$) to a nearby point $P$ from a discrete subset (e.g., $Z$). This process seems to be inherently discontinuous in the sense that two consecutive noisy measurements $X_1$ and $X_2$ of the same value may be extremely close to each other and yet they can be rounded to different points $P_1 e P_2$, which is undesirable in many applications. In this paper we show how to make the rounding process perfectly continuous in the sense that it maps any pair of sufficiently close measurements to the same point. We call such a process consistent rounding, and make it possible by allowing a small amount of information about the first measurement $X_1$ to be unidirectionally communicated to and used by the rounding process of $X_2$. The fault tolerance of a consistent rounding scheme is defined by the maximum distance between pairs of measurements which guarantees that they are always rounded to the same point, and our goal is to study the possible tradeoffs between the amount of information provided and the achievable fault tolerance for various types of spaces. When the measurements $X_i$ are arbitrary vectors in $R^d$, we show that communicating $log_2(d+1)$ bits of information is both sufficient and necessary (in the worst case) in order to achieve consistent rounding for some positive fault tolerance, and when d=3 we obtain a tight upper and lower asymptotic bound of $(0.561+o(1))k^{1/3}$ on the achievable fault tolerance when we reveal $log_2(k)$ bits of information about how $X_1$ was rounded. We analyze the problem by considering the possible colored tilings of the space with $k$ available colors, and obtain our upper and lower bounds with a variety of mathematical techniques including isoperimetric inequalities, the Brunn-Minkowski theorem, sphere packing bounds, and v{C}ech cohomology.
We consider the problem of digitalizing Euclidean segments. Specifically, we look for a constructive method to connect any two points in $mathbb{Z}^d$. The construction must be {em consistent} (that is, satisfy the natural extension of the Euclidean axioms) while resembling them as much as possible. Previous work has shown asymptotically tight results in two dimensions with $Theta(log N)$ error, where resemblance between segments is measured with the Hausdorff distance, and $N$ is the $L_1$ distance between the two points. This construction was considered tight because of a $Omega(log N)$ lower bound that applies to any consistent construction in $mathbb{Z}^2$. In this paper we observe that the lower bound does not directly extend to higher dimensions. We give an alternative argument showing that any consistent construction in $d$ dimensions must have $Omega(log^{1/(d-1)} N)$ error. We tie the error of a consistent construction in high dimensions to the error of similar {em weak} constructions in two dimensions (constructions for which some points need not satisfy all the axioms). This not only opens the possibility for having constructions with $o(log N)$ error in high dimensions, but also opens up an interesting line of research in the tradeoff between the number of axiom violations and the error of the construction. In order to show our lower bound, we also consider a colored variation of the concept of discrepancy of a set of points that we find of independent interest.
Clustering, a fundamental task in data science and machine learning, groups a set of objects in such a way that objects in the same cluster are closer to each other than to those in other clusters. In this paper, we consider a well-known structure, so-called $r$-nets, which rigorously captures the properties of clustering. We devise algorithms that improve the run-time of approximating $r$-nets in high-dimensional spaces with $ell_1$ and $ell_2$ metrics from $tilde{O}(dn^{2-Theta(sqrt{epsilon})})$ to $tilde{O}(dn + n^{2-alpha})$, where $alpha = Omega({epsilon^{1/3}}/{log(1/epsilon)})$. These algorithms are also used to improve a framework that provides approximate solutions to other high dimensional distance problems. Using this framework, several important related problems can also be solved efficiently, e.g., $(1+epsilon)$-approximate $k$th-nearest neighbor distance, $(4+epsilon)$-approximate Min-Max clustering, $(4+epsilon)$-approximate $k$-center clustering. In addition, we build an algorithm that $(1+epsilon)$-approximates greedy permutations in time $tilde{O}((dn + n^{2-alpha}) cdot log{Phi})$ where $Phi$ is the spread of the input. This algorithm is used to $(2+epsilon)$-approximate $k$-center with the same time complexity.
What is the minimum number of guesses needed on average to correctly guess a realization of a random variable? The answer to this question led to the introduction of the notion of a quantity called guesswork by Massey in 1994, which can be viewed as an alternate security criterion to entropy. In this paper, we consider the guesswork in the presence of quantum side information, and show that a general sequential guessing strategy is equivalent to performing a single measurement and choosing a guessing strategy from the outcome. We use this result to deduce entropic one-shot and asymptotic bounds on the guesswork in the presence of quantum side information, and to formulate a semi-definite program (SDP) to calculate the quantity. We evaluate the guesswork for a simple example involving the BB84 states, both numerically and analytically, and prove a continuity result that certifies the security of slightly imperfect key states when the guesswork is used as the security criterion.
We consider incidences among colored sets of lines in $mathbb{R}^d$ and examine whether the existence of certain concurrences between lines of $k$ colors force the existence of at least one concurrence between lines of $k+1$ colors. This question is relevant for problems in 3D reconstruction in computer vision.
This paper offers a characterization of fundamental limits on the classification and reconstruction of high-dimensional signals from low-dimensional features, in the presence of side information. We consider a scenario where a decoder has access both to linear features of the signal of interest and to linear features of the side information signal; while the side information may be in a compressed form, the objective is recovery or classification of the primary signal, not the side information. The signal of interest and the side information are each assumed to have (distinct) latent discrete labels; conditioned on these two labels, the signal of interest and side information are drawn from a multivariate Gaussian distribution. With joint probabilities on the latent labels, the overall signal-(side information) representation is defined by a Gaussian mixture model. We then provide sharp sufficient and/or necessary conditions for these quantities to approach zero when the covariance matrices of the Gaussians are nearly low-rank. These conditions, which are reminiscent of the well-known Slepian-Wolf and Wyner-Ziv conditions, are a function of the number of linear features extracted from the signal of interest, the number of linear features extracted from the side information signal, and the geometry of these signals and their interplay. Moreover, on assuming that the signal of interest and the side information obey such an approximately low-rank model, we derive expansions of the reconstruction error as a function of the deviation from an exactly low-rank model; such expansions also allow identification of operational regimes where the impact of side information on signal reconstruction is most relevant. Our framework, which offers a principled mechanism to integrate side information in high-dimensional data problems, is also tested in the context of imaging applications.