On the Information Complexity of Proper Learners for VC Classes in the Realizable Case

296 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mahdi Haghifam

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Mahdi Haghifam - Gintare Karolina Dziugaite - Shay Moran

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d log n +2$ to $O(d)$, where $n$ is the number of i.i.d. training examples. In fact, we exhibit VC classes for which the CMI of any proper learner cannot be bounded by any real-valued function of the VC dimension only.

قيم البحث

64 - Ido Nachum , Amir Yehudayoff 2018

How many bits of information are revealed by a learning algorithm for a concept class of VC-dimension $d$? Previous works have shown that even for $d=1$ the amount of information may be unbounded (tend to $infty$ with the universe size). Can it be th at all concepts in the class require leaking a large amount of information? We show that typically concepts do not require leakage. There exists a proper learning algorithm that reveals $O(d)$ bits of information for most concepts in the class. This result is a special case of a more general phenomenon we explore. If there is a low information learner when the algorithm {em knows} the underlying distribution on inputs, then there is a learner that reveals little information on an average concept {em without knowing} the distribution on inputs.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

148 - Gen Li , Yuxin Chen , Yuejie Chi 2021

Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates the linear re alizability of the optimal Q-function (also called the linear $Q^{star}$ problem). While linear realizability alone does not allow for sample-efficient solutions in general, the presence of a large sub-optimality gap is a potential game changer, depending on the sampling mechanism in use. Informally, sample efficiency is achievable with a large sub-optimality gap when a generative model is available but is unfortunately infeasible when we turn to standard online RL settings. In this paper, we make progress towards understanding this linear $Q^{star}$ problem by investigating a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. This protocol is more flexible than the standard online RL setting, while being practically relevant and far more restrictive than the generative model. We develop an algorithm tailored to this setting, achieving a sample complexity that scales polynomially with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space. Our findings underscore the fundamental interplay between sampling protocols and low-complexity structural representation in RL.

التعلم الآلي نظرية المعلومات نظرية المعلومات

The Value of Help Bits in Randomized and Average-Case Complexity

143 - Salman Beigi , Omid Etesami , Amin Gohari 2014

Help bits are some limited trusted information about an instance or instances of a computational problem that may reduce the computational complexity of solving that instance or instances. In this paper, we study the value of help bits in the setting s of randomized and average-case complexity. Amir, Beigel, and Gasarch (1990) show that for constant $k$, if $k$ instances of a decision problem can be efficiently solved using less than $k$ bits of help, then the problem is in P/poly. We extend this result to the setting of randomized computation: We show that the decision problem is in P/poly if using $ell$ help bits, $k$ instances of the problem can be efficiently solved with probability greater than $2^{ell-k}$. The same result holds if using less than $k(1 - h(alpha))$ help bits (where $h(cdot)$ is the binary entropy function), we can efficiently solve $(1-alpha)$ fraction of the instances correctly with non-vanishing probability. We also extend these two results to non-constant but logarithmic $k$. In this case however, instead of showing that the problem is in P/poly we show that it satisfies $k$-membership comparability, a notion known to be related to solving $k$ instances using less than $k$ bits of help. Next we consider the setting of average-case complexity: Assume that we can solve $k$ instances of a decision problem using some help bits whose entropy is less than $k$ when the $k$ instances are drawn independently from a particular distribution. Then we can efficiently solve an instance drawn from that distribution with probability better than $1/2$. Finally, we show that in the case where $k$ is super-logarithmic, assuming $k$-membership comparability of a decision problem, one cannot prove that the problem is in P/poly by a black-box proof.

التعقيد الحسابي نظرية المعلومات نظرية المعلومات

On Predictive Information in RNNs

91 - Zhe Dong , Deniz Oktay , Ben Poole 2019

Certain biological neurons demonstrate a remarkable capability to optimally compress the history of sensory inputs while being maximally informative about the future. In this work, we investigate if the same can be said of artificial neurons in recur rent neural networks (RNNs) trained with maximum likelihood. Empirically, we find that RNNs are suboptimal in the information plane. Instead of optimally compressing past information, they extract additional information that is not relevant for predicting the future. We show that constraining past information by injecting noise into the hidden state can improve RNNs in several ways: optimality in the predictive information plane, sample quality, heldout likelihood, and downstream classification performance.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Complexity as Causal Information Integration

123 - Carlotta Langer , Nihat Ay 2020

Complexity measures in the context of the Integrated Information Theory of consciousness try to quantify the strength of the causal connections between different neurons. This is done by minimizing the KL-divergence between a full system and one with out causal connections. Various measures have been proposed and compared in this setting. We will discuss a class of information geometric measures that aim at assessing the intrinsic causal influences in a system. One promising candidate of these measures, denoted by $Phi_{CIS}$, is based on conditional independence statements and does satisfy all of the properties that have been postulated as desirable. Unfortunately it does not have a graphical representation which makes it less intuitive and difficult to analyze. We propose an alternative approach using a latent variable which models a common exterior influence. This leads to a measure $Phi_{CII}$, Causal Information Integration, that satisfies all of the required conditions. Our measure can be calculated using an iterative information geometric algorithm, the em-algorithm. Therefore we are able to compare its behavior to existing integrated information measures.

المنهجية نظرية المعلومات نظرية المعلومات