Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Finding the Maximizers of the Information Divergence from an Exponential Family

450 0 0.0 ( 0 )

Download Cite

Added by Johannes Rauh

Publication date 2009

fields Informatics Engineering

and research's language is English

Authors Johannes Rauh

Information Theory Information Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper investigates maximizers of the information divergence from an exponential family $E$. It is shown that the $rI$-projection of a maximizer $P$ to $E$ is a convex combination of $P$ and a probability measure $P_-$ with disjoint support and the same value of the sufficient statistics $A$. This observation can be used to transform the original problem of maximizing $D(cdot||E)$ over the set of all probability measures into the maximization of a function $Dbar$ over a convex subset of $ker A$. The global maximizers of both problems correspond to each other. Furthermore, finding all local maximizers of $Dbar$ yields all local maximizers of $D(cdot||E)$. This paper also proposes two algorithms to find the maximizers of $Dbar$ and applies them to two examples, where the maximizers of $D(cdot||E)$ were not known before.

rate research

Maximizing the Bregman divergence from a Bregman family

66 - Johannes Rauh , Frantiv{s}ek Matuv{s} 2020

The problem to maximize the information divergence from an exponential family is generalized to the setting of Bregman divergences and suitably defined Bregman families.

Information Theory Information Theory

Convergence of Contrastive Divergence Algorithm in Exponential Family

152 - Bai Jiang , Tung-Yu Wu , Yifan Jin 2016

The Contrastive Divergence (CD) algorithm has achieved notable success in training energy-based models including Restricted Boltzmann Machines and played a key role in the emergence of deep learning. The idea of this algorithm is to approximate the intractable term in the exact gradient of the log-likelihood function by using short Markov chain Monte Carlo (MCMC) runs. The approximate gradient is computationally-cheap but biased. Whether and why the CD algorithm provides an asymptotically consistent estimate are still open questions. This paper studies the asymptotic properties of the CD algorithm in canonical exponential families, which are special cases of the energy-based model. Suppose the CD algorithm runs $m$ MCMC transition steps at each iteration $t$ and iteratively generates a sequence of parameter estimates ${theta_t}_{t ge 0}$ given an i.i.d. data sample ${X_i}_{i=1}^n sim p_{theta_star}$. Under conditions which are commonly obeyed by the CD algorithm in practice, we prove the existence of some bounded $m$ such that any limit point of the time average $left. sum_{s=0}^{t-1} theta_s right/ t$ as $t to infty$ is a consistent estimate for the true parameter $theta_star$. Our proof is based on the fact that ${theta_t}_{t ge 0}$ is a homogenous Markov chain conditional on the data sample ${X_i}_{i=1}^n$. This chain meets the Foster-Lyapunov drift criterion and converges to a random walk around the Maximum Likelihood Estimate. The range of the random walk shrinks to zero at rate $mathcal{O}(1/sqrt[3]{n})$ as the sample size $n to infty$.

Machine Learning

Renyi Divergence and Kullback-Leibler Divergence

608 - Tim van Erven , Peter Harremoes 2012

Renyi divergence is related to Renyi entropy much like Kullback-Leibler divergence is related to Shannons entropy, and comes up in many settings. It was introduced by Renyi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Renyi divergence of order 1 equals the Kullback-Leibler divergence. We review and extend the most important properties of Renyi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of $sigma$-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.

Information Theory Information Theory Statistics Theory

Information Bottleneck for an Oblivious Relay with Channel State Information: the Vector Case

238 - Hao Xu , Tianyu Yang , Giuseppe Caire 2021

This paper considers the information bottleneck (IB) problem of a Rayleigh fading multiple-input multiple-out (MIMO) channel. Due to the bottleneck constraint, it is impossible for the oblivious relay to inform the destination node of the perfect channel state information (CSI) in each channel realization. To evaluate the bottleneck rate, we provide an upper bound by assuming that the destination node can get the perfect CSI at no cost and two achievable schemes with simple symbol-by-symbol relay processing and compression. Numerical results show that the lower bounds obtained by the proposed achievable schemes can come close to the upper bound on a wide range of relevant system parameters.

Information Theory Information Theory

The Age of Incorrect Information: an Enabler of Semantics-Empowered Communication

78 - Ali Maatouk , Mohamad Assaad , Anthony Ephremides 2020

In this paper, we introduce the Age of Incorrect Information (AoII) as an enabler for semantics-empowered communication, a newly advocated communication paradigm centered around datas role and its usefulness to the communications goal. First, we shed light on how the traditional communication paradigm, with its role-blind approach to data, is vulnerable to performance bottlenecks. Next, we highlight the shortcomings of several proposed performance measures destined to deal with the traditional communication paradigms limitations, namely the Age of Information (AoI) and the error-based metrics. We also show how the AoII addresses these shortcomings and captures more meaningfully the purpose of data. Afterward, we consider the problem of minimizing the average AoII in a transmitter-receiver pair scenario where packets are sent over an unreliable channel subject to a transmission rate constraint. We prove that the optimal transmission strategy is a randomized threshold policy, and we propose a low complexity algorithm that finds both the optimal threshold and the randomization parameter. Furthermore, we provide a theoretical comparison between the AoII framework and the standard error-based metrics counterpart. Interestingly, we show that the AoII-optimal policy is also error-optimal for the adopted information source model. At the same time, the converse is not necessarily true. Finally, we implement our proposed policy in various real-life applications, such as video streaming, and we showcase its performance advantages compared to both the error-optimal and the AoI-optimal policies.

Information Theory Information Theory

comments

Fetching comments

Higher Institute of Business Administration

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Finding the Maximizers of the Information Divergence from an Exponential Family

Ask ChatGPT about the research

No Arabic abstract

Read More