No Arabic abstract
The minimum mean-square error (MMSE) achievable by optimal estimation of a random variable $Yinmathbb{R}$ given another random variable $Xinmathbb{R}^{d}$ is of much interest in a variety of statistical contexts. In this paper we propose two estimators for the MMSE, one based on a two-layer neural network and the other on a special three-layer neural network. We derive lower bounds for the MMSE based on the proposed estimators and the Barron constant of an appropriate function of the conditional expectation of $Y$ given $X$. Furthermore, we derive a general upper bound for the Barron constant that, when $Xinmathbb{R}$ is post-processed by the additive Gaussian mechanism, produces order optimal estimates in the large noise regime.
Recovery of the causal structure of dynamic networks from noisy measurements has long been a problem of intense interest across many areas of science and engineering. Many algorithms have been proposed, but there is no work that compares the performance of the algorithms to converse bounds in a non-asymptotic setting. As a step to address this problem, this paper gives lower bounds on the error probability for causal network support recovery in a linear Gaussian setting. The bounds are based on the use of the Bhattacharyya coefficient for binary hypothesis testing problems with mixture probability distributions. Comparison of the bounds and the performance achieved by two representative recovery algorithms are given for sparse random networks based on the ErdH{o}s-Renyi model.
The problem of estimating an arbitrary random vector from its observation corrupted by additive white Gaussian noise, where the cost function is taken to be the Minimum Mean $p$-th Error (MMPE), is considered. The classical Minimum Mean Square Error (MMSE) is a special case of the MMPE. Several bounds, properties and applications of the MMPE are derived and discussed. The optimal MMPE estimator is found for Gaussian and binary input distributions. Properties of the MMPE as a function of the input distribution, SNR and order $p$ are derived. In particular, it is shown that the MMPE is a continuous function of $p$ and SNR. These results are possible in view of interpolation and change of measure bounds on the MMPE. The `Single-Crossing-Point Property (SCPP) that bounds the MMSE for all SNR values {it above} a certain value, at which the MMSE is known, together with the I-MMSE relationship is a powerful tool in deriving converse proofs in information theory. By studying the notion of conditional MMPE, a unifying proof (i.e., for any $p$) of the SCPP is shown. A complementary bound to the SCPP is then shown, which bounds the MMPE for all SNR values {it below} a certain value, at which the MMPE is known. As a first application of the MMPE, a bound on the conditional differential entropy in terms of the MMPE is provided, which then yields a generalization of the Ozarow-Wyner lower bound on the mutual information achieved by a discrete input on a Gaussian noise channel. As a second application, the MMPE is shown to improve on previous characterizations of the phase transition phenomenon that manifests, in the limit as the length of the capacity achieving code goes to infinity, as a discontinuity of the MMSE as a function of SNR. As a final application, the MMPE is used to show bounds on the second derivative of mutual information, that tighten previously known bounds.
A decentralized coded caching scheme has been proposed by Maddah-Ali and Niesen, and has been shown to alleviate the load of networks. Recently, placement delivery array (PDA) was proposed to characterize the coded caching scheme. In this paper, a neural architecture is first proposed to learn the construction of PDAs. Our model solves the problem of variable size PDAs using mechanism of neural attention and reinforcement learning. It differs from the previous attempts in that, instead of using combined optimization algorithms to get PDAs, it uses sequence-to-sequence model to learn construct PDAs. Numerical results are given to demonstrate that the proposed method can effectively implement coded caching. We also show that the complexity of our method to construct PDAs is low.
Batch codes are a useful notion of locality for error correcting codes, originally introduced in the context of distributed storage and cryptography. Many constructions of batch codes have been given, but few lower bound (limitation) results are known, leaving gaps between the best known constructions and best known lower bounds. Towards determining the optimal redundancy of batch codes, we prove a new lower bound on the redundancy of batch codes. Specifically, we study (primitive, multiset) linear batch codes that systematically encode $n$ information symbols into $N$ codeword symbols, with the requirement that any multiset of $k$ symbol requests can be obtained in disjoint ways. We show that such batch codes need $Omega(sqrt{Nk})$ symbols of redundancy, improving on the previous best lower bounds of $Omega(sqrt{N}+k)$ at all $k=n^varepsilon$ with $varepsilonin(0,1)$. Our proof follows from analyzing the dimension of the order-$O(k)$ tensor of the batch codes dual code.
The ability to train randomly initialised deep neural networks is known to depend strongly on the variance of the weight matrices and biases as well as the choice of nonlinear activation. Here we complement the existing geometric analysis of this phenomenon with an information theoretic alternative. Lower bounds are derived for the mutual information between an input and hidden layer outputs. Using a mean field analysis we are able to provide analytic lower bounds as functions of network weight and bias variances as well as the choice of nonlinear activation. These results show that initialisations known to be optimal from a training point of view are also superior from a mutual information perspective.