Do you want to publish a course? Click here

On the Information Complexity of Proper Learners for VC Classes in the Realizable Case

296   0   0.0 ( 0 )
 Added by Mahdi Haghifam
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

We provide a negative resolution to a conjecture of Steinke and Zakynthinou (2020a), by showing that their bound on the conditional mutual information (CMI) of proper learners of Vapnik--Chervonenkis (VC) classes cannot be improved from $d log n +2$ to $O(d)$, where $n$ is the number of i.i.d. training examples. In fact, we exhibit VC classes for which the CMI of any proper learner cannot be bounded by any real-valued function of the VC dimension only.



rate research

Read More

How many bits of information are revealed by a learning algorithm for a concept class of VC-dimension $d$? Previous works have shown that even for $d=1$ the amount of information may be unbounded (tend to $infty$ with the universe size). Can it be that all concepts in the class require leaking a large amount of information? We show that typically concepts do not require leakage. There exists a proper learning algorithm that reveals $O(d)$ bits of information for most concepts in the class. This result is a special case of a more general phenomenon we explore. If there is a low information learner when the algorithm {em knows} the underlying distribution on inputs, then there is a learner that reveals little information on an average concept {em without knowing} the distribution on inputs.
148 - Gen Li , Yuxin Chen , Yuejie Chi 2021
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates the linear realizability of the optimal Q-function (also called the linear $Q^{star}$ problem). While linear realizability alone does not allow for sample-efficient solutions in general, the presence of a large sub-optimality gap is a potential game changer, depending on the sampling mechanism in use. Informally, sample efficiency is achievable with a large sub-optimality gap when a generative model is available but is unfortunately infeasible when we turn to standard online RL settings. In this paper, we make progress towards understanding this linear $Q^{star}$ problem by investigating a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. This protocol is more flexible than the standard online RL setting, while being practically relevant and far more restrictive than the generative model. We develop an algorithm tailored to this setting, achieving a sample complexity that scales polynomially with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space. Our findings underscore the fundamental interplay between sampling protocols and low-complexity structural representation in RL.
Help bits are some limited trusted information about an instance or instances of a computational problem that may reduce the computational complexity of solving that instance or instances. In this paper, we study the value of help bits in the settings of randomized and average-case complexity. Amir, Beigel, and Gasarch (1990) show that for constant $k$, if $k$ instances of a decision problem can be efficiently solved using less than $k$ bits of help, then the problem is in P/poly. We extend this result to the setting of randomized computation: We show that the decision problem is in P/poly if using $ell$ help bits, $k$ instances of the problem can be efficiently solved with probability greater than $2^{ell-k}$. The same result holds if using less than $k(1 - h(alpha))$ help bits (where $h(cdot)$ is the binary entropy function), we can efficiently solve $(1-alpha)$ fraction of the instances correctly with non-vanishing probability. We also extend these two results to non-constant but logarithmic $k$. In this case however, instead of showing that the problem is in P/poly we show that it satisfies $k$-membership comparability, a notion known to be related to solving $k$ instances using less than $k$ bits of help. Next we consider the setting of average-case complexity: Assume that we can solve $k$ instances of a decision problem using some help bits whose entropy is less than $k$ when the $k$ instances are drawn independently from a particular distribution. Then we can efficiently solve an instance drawn from that distribution with probability better than $1/2$. Finally, we show that in the case where $k$ is super-logarithmic, assuming $k$-membership comparability of a decision problem, one cannot prove that the problem is in P/poly by a black-box proof.
Certain biological neurons demonstrate a remarkable capability to optimally compress the history of sensory inputs while being maximally informative about the future. In this work, we investigate if the same can be said of artificial neurons in recurrent neural networks (RNNs) trained with maximum likelihood. Empirically, we find that RNNs are suboptimal in the information plane. Instead of optimally compressing past information, they extract additional information that is not relevant for predicting the future. We show that constraining past information by injecting noise into the hidden state can improve RNNs in several ways: optimality in the predictive information plane, sample quality, heldout likelihood, and downstream classification performance.
123 - Carlotta Langer , Nihat Ay 2020
Complexity measures in the context of the Integrated Information Theory of consciousness try to quantify the strength of the causal connections between different neurons. This is done by minimizing the KL-divergence between a full system and one without causal connections. Various measures have been proposed and compared in this setting. We will discuss a class of information geometric measures that aim at assessing the intrinsic causal influences in a system. One promising candidate of these measures, denoted by $Phi_{CIS}$, is based on conditional independence statements and does satisfy all of the properties that have been postulated as desirable. Unfortunately it does not have a graphical representation which makes it less intuitive and difficult to analyze. We propose an alternative approach using a latent variable which models a common exterior influence. This leads to a measure $Phi_{CII}$, Causal Information Integration, that satisfies all of the required conditions. Our measure can be calculated using an iterative information geometric algorithm, the em-algorithm. Therefore we are able to compare its behavior to existing integrated information measures.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا