No Arabic abstract
The graphical structure of Probabilistic Graphical Models (PGMs) encodes the conditional independence (CI) relations that hold in the modeled distribution. Graph algorithms, such as d-separation, use this structure to infer additional conditional independencies, and to query whether a specific CI holds in the distribution. The premise of all current systems-of-inference for deriving CIs in PGMs, is that the set of CIs used for the construction of the PGM hold exactly. In practice, algorithms for extracting the structure of PGMs from data, discover approximate CIs that do not hold exactly in the distribution. In this paper, we ask how the error in this set propagates to the inferred CIs read off the graphical structure. More precisely, what guarantee can we provide on the inferred CI when the set of CIs that entailed it hold only approximately? It has recently been shown that in the general case, no such guarantee can be provided. We prove that such a guarantee exists for the set of CIs inferred in directed graphical models, making the d-separation algorithm a sound and complete system for inferring approximate CIs. We also prove an approximation guarantee for independence relations derived from marginal CIs.
In this paper we treat both forms of probabilistic inference, estimating marginal probabilities of the joint distribution and finding the most probable assignment, through a unified message-passing algorithm architecture. We generalize the Belief Propagation (BP) algorithms of sum-product and max-product and tree-rewaighted (TRW) sum and max product algorithms (TRBP) and introduce a new set of convergent algorithms based on convex-free-energy and Linear-Programming (LP) relaxation as a zero-temprature of a convex-free-energy. The main idea of this work arises from taking a general perspective on the existing BP and TRBP algorithms while observing that they all are reductions from the basic optimization formula of $f + sum_i h_i$ where the function $f$ is an extended-valued, strictly convex but non-smooth and the functions $h_i$ are extended-valued functions (not necessarily convex). We use tools from convex duality to present the primal-dual ascent algorithm which is an extension of the Bregman successive projection scheme and is designed to handle optimization of the general type $f + sum_i h_i$. Mapping the fractional-free-energy variational principle to this framework introduces the norm-product message-passing. Special cases include sum-product and max-product (BP algorithms) and the TRBP algorithms. When the fractional-free-energy is set to be convex (convex-free-energy) the norm-product is globally convergent for estimating of marginal probabilities and for approximating the LP-relaxation. We also introduce another branch of the norm-product, the convex-max-product. The convex-max-product is convergent (unlike max-product) and aims at solving the LP-relaxation.
We study the hardness of Approximate Query Processing (AQP) of various types of queries involving joins over multiple tables of possibly different sizes. In the case where the query result is a single value (e.g., COUNT, SUM, and COUNT(DISTINCT)), we prove worst-case information-theoretic lower bounds for AQP problems that are given parameters $epsilon$ and $delta$, and return estimated results within a factor of 1+$epsilon$ of the true results with error probability at most $delta$. In particular, the lower bounds for cardinality estimation over joins under various settings are contained in our results. Informally, our results show that for various database queries with joins, unless restricted to the set of queries whose results are always guaranteed to be above a very large threshold, the amount of information an AQP algorithm needs for returning an accurate approximation is at least linear in the number of rows in the largest table. Similar lower bounds even hold for some special cases where additional information such as top-K heavy hitters and all frequency vectors are available. In the case of GROUP-BY where the query result is not a single number, we study the lower bound for the amount of information used by any approximation algorithm that does not report any non-existing group and does not miss groups of large total size. Our work extends the work of Alon, Gibbons, Matias, and Szegedy [AGMS99].We compare our lower bounds with the amount of information required by Bernoulli sampling to give an accurate approximation. For COUNT queries with joins over multiple tables of the same size, the upper bound matches the lower bound, unless the problem setting is restricted to the set of queries whose results are always guaranteed to be above a very large threshold.
In this paper, we propose a method to address the problem of source estimation for Sparse Component Analysis (SCA) in the presence of additive noise. Our method is a generalization of a recently proposed method (SL0), which has the advantage of directly minimizing the L0-norm instead of L1-norm, while being very fast. SL0 is based on minimization of the smoothed L0-norm subject to As=x. In order to better estimate the source vector for noisy mixtures, we suggest then to remove the constraint As=x, by relaxing exact equality to an approximation (we call our method Smoothed L0-norm Denoising or SL0DN). The final result can then be obtained by minimization of a proper linear combination of the smoothed L0-norm and a cost function for the approximation. Experimental results emphasize on the significant enhancement of the modified method in noisy cases.
Large-scale machine learning and data mining methods routinely distribute computations across multiple agents to parallelize processing. The time required for the computations at the agents is affected by the availability of local resources and/or poor channel conditions giving rise to the straggler problem. As a remedy to this problem, we employ Unequal Error Protection (UEP) codes to obtain an approximation of the matrix product in the distributed computation setting to provide higher protection for the blocks with higher effect on the final result. We characterize the performance of the proposed approach from a theoretical perspective by bounding the expected reconstruction error for matrices with uncorrelated entries. We also apply the proposed coding strategy to the computation of the back-propagation step in the training of a Deep Neural Network (DNN) for an image classification task in the evaluation of the gradients. Our numerical experiments show that it is indeed possible to obtain significant improvements in the overall time required to achieve the DNN training convergence by producing approximation of matrix products using UEP codes in the presence of stragglers.
This book chapter is an introduction to and an overview of the information-theoretic, task independent utility function Empowerment, which is defined as the channel capacity between an agents actions and an agents sensors. It quantifies how much influence and control an agent has over the world it can perceive. This book chapter discusses the general idea behind empowerment as an intrinsic motivation and showcases several previous applications of empowerment to demonstrate how empowerment can be applied to different sensor-motor configuration, and how the same formalism can lead to different observed behaviors. Furthermore, we also present a fast approximation for empowerment in the continuous domain.