Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Average-Case Information Complexity of Learning

65 0 0.0 ( 0 )

Download Cite

Added by Ido Nachum

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Ido Nachum - Amir Yehudayoff

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

How many bits of information are revealed by a learning algorithm for a concept class of VC-dimension $d$? Previous works have shown that even for $d=1$ the amount of information may be unbounded (tend to $infty$ with the universe size). Can it be that all concepts in the class require leaking a large amount of information? We show that typically concepts do not require leakage. There exists a proper learning algorithm that reveals $O(d)$ bits of information for most concepts in the class. This result is a special case of a more general phenomenon we explore. If there is a low information learner when the algorithm {em knows} the underlying distribution on inputs, then there is a learner that reveals little information on an average concept {em without knowing} the distribution on inputs.

rate research

On the Information Complexity of Proper Learners for VC Classes in the Realizable Case

295 - Mahdi Haghifam , Gintare Karolina Dziugaite , Shay Moran 2020

We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d log n +2$ to $O(d)$, where $n$ is the number of i.i.d. training examples. In fact, we exhibit VC classes for which the CMI of any proper learner cannot be bounded by any real-valued function of the VC dimension only.

Machine Learning Information Theory Information Theory

The Value of Help Bits in Randomized and Average-Case Complexity

447 - Salman Beigi , Omid Etesami , Amin Gohari 2014

Help bits are some limited trusted information about an instance or instances of a computational problem that may reduce the computational complexity of solving that instance or instances. In this paper, we study the value of help bits in the settings of randomized and average-case complexity. Amir, Beigel, and Gasarch (1990) show that for constant $k$, if $k$ instances of a decision problem can be efficiently solved using less than $k$ bits of help, then the problem is in P/poly. We extend this result to the setting of randomized computation: We show that the decision problem is in P/poly if using $ell$ help bits, $k$ instances of the problem can be efficiently solved with probability greater than $2^{ell-k}$. The same result holds if using less than $k(1 - h(alpha))$ help bits (where $h(cdot)$ is the binary entropy function), we can efficiently solve $(1-alpha)$ fraction of the instances correctly with non-vanishing probability. We also extend these two results to non-constant but logarithmic $k$. In this case however, instead of showing that the problem is in P/poly we show that it satisfies $k$-membership comparability, a notion known to be related to solving $k$ instances using less than $k$ bits of help. Next we consider the setting of average-case complexity: Assume that we can solve $k$ instances of a decision problem using some help bits whose entropy is less than $k$ when the $k$ instances are drawn independently from a particular distribution. Then we can efficiently solve an instance drawn from that distribution with probability better than $1/2$. Finally, we show that in the case where $k$ is super-logarithmic, assuming $k$-membership comparability of a decision problem, one cannot prove that the problem is in P/poly by a black-box proof.

Computational Complexity Information Theory Information Theory

Information Theoretic Optimal Learning of Gaussian Graphical Models

186 - Sidhant Misra , Marc Vuffray , Andrey Y. Lokhov 2017

What is the optimal number of independent observations from which a sparse Gaussian Graphical Model can be correctly recovered? Information-theoretic arguments provide a lower bound on the minimum number of samples necessary to perfectly identify the support of any multivariate normal distribution as a function of model parameters. For a model defined on a sparse graph with $p$ nodes, a maximum degree $d$ and minimum normalized edge strength $kappa$, this necessary number of samples scales at least as $d log p/kappa^2$. The sample complexity requirements of existing methods for perfect graph reconstruction exhibit dependency on additional parameters that do not enter in the lower bound. The question of whether the lower bound is tight and achievable by a polynomial time algorithm remains open. In this paper, we constructively answer this question and propose an algorithm, termed DICE, whose sample complexity matches the information-theoretic lower bound up to a universal constant factor. We also propose a related algorithm SLICE that has a slightly higher sample complexity, but can be implemented as a mixed integer quadratic program which makes it attractive in practice. Importantly, SLICE retains a critical advantage of DICE in that its sample complexity only depends on quantities present in the information theoretic lower bound. We anticipate that this result will stimulate future search of computationally efficient sample-optimal algorithms.

Machine Learning Information Theory Information Theory

Complexity as Causal Information Integration

123 - Carlotta Langer , Nihat Ay 2020

Complexity measures in the context of the Integrated Information Theory of consciousness try to quantify the strength of the causal connections between different neurons. This is done by minimizing the KL-divergence between a full system and one without causal connections. Various measures have been proposed and compared in this setting. We will discuss a class of information geometric measures that aim at assessing the intrinsic causal influences in a system. One promising candidate of these measures, denoted by $Phi_{CIS}$, is based on conditional independence statements and does satisfy all of the properties that have been postulated as desirable. Unfortunately it does not have a graphical representation which makes it less intuitive and difficult to analyze. We propose an alternative approach using a latent variable which models a common exterior influence. This leads to a measure $Phi_{CII}$, Causal Information Integration, that satisfies all of the required conditions. Our measure can be calculated using an iterative information geometric algorithm, the em-algorithm. Therefore we are able to compare its behavior to existing integrated information measures.

Methodology Information Theory Information Theory

Learning non-parametric Markov networks with mutual information

121 - Janne Leppa-aho , Santeri Raisanen , Xiao Yang 2017

We propose a method for learning Markov network structures for continuous data without invoking any assumptions about the distribution of the variables. The method makes use of previous work on a non-parametric estimator for mutual information which is used to create a non-parametric test for multivariate conditional independence. This independence test is then combined with an efficient constraint-based algorithm for learning the graph structure. The performance of the method is evaluated on several synthetic data sets and it is shown to learn considerably more accurate structures than competing methods when the dependencies between the variables involve non-linearities.

Machine Learning Information Theory Information Theory

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Average-Case Information Complexity of Learning

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions