No Arabic abstract
Biological and artificial neural systems are composed of many local processors, and their capabilities depend upon the transfer function that relates each local processors outputs to its inputs. This paper uses a recent advance in the foundations of information theory to study the properties of local processors that use contextual input to amplify or attenuate transmission of information about their driving inputs. This advance enables the information transmitted by processors with two distinct inputs to be decomposed into those components unique to each input, that shared between the two inputs, and that which depends on both though it is in neither, i.e. synergy. The decompositions that we report here show that contextual modulation has information processing properties that contrast with those of all four simple arithmetic operators, that it can take various forms, and that the form used in our previous studies of artificial neural nets composed of local processors with both driving and contextual inputs is particularly well-suited to provide the distinctive capabilities of contextual modulation under a wide range of conditions. We argue that the decompositions reported here could be compared with those obtained from empirical neurobiological and psychophysical data under conditions thought to reflect contextual modulation. That would then shed new light on the underlying processes involved. Finally, we suggest that such decompositions could aid the design of context-sensitive machine learning algorithms.
We develop a theoretical framework for defining and identifying flows of information in computational systems. Here, a computational system is assumed to be a directed graph, with clocked nodes that send transmissions to each other along the edges of the graph at discrete points in time. We are interested in a definition that captures the dynamic flow of information about a specific message, and which guarantees an unbroken information path between appropriately defined inputs and outputs in the directed graph. Prior measures, including those based on Granger Causality and Directed Information, fail to provide clear assumptions and guarantees about when they correctly reflect information flow about a message. We take a systematic approach---iterating through candidate definitions and counterexamples---to arrive at a definition for information flow that is based on conditional mutual information, and which satisfies desirable properties, including the existence of information paths. Finally, we describe how information flow might be detected in a noiseless setting, and provide an algorithm to identify information paths on the time-unrolled graph of a computational system.
Given a probability measure $mu$ over ${mathbb R}^n$, it is often useful to approximate it by the convex combination of a small number of probability measures, such that each component is close to a product measure. Recently, Ronen Eldan used a stochastic localization argument to prove a general decomposition result of this type. In Eldans theorem, the `number of components is characterized by the entropy of the mixture, and `closeness to product is characterized by the covariance matrix of each component. We present an elementary proof of Eldans theorem which makes use of an information theory (or estimation theory) interpretation. The proof is analogous to the one of an earlier decomposition result known as the `pinning lemma.
We introduce an axiomatic approach to entropies and relative entropies that relies only on minimal information-theoretic axioms, namely monotonicity under mixing and data-processing as well as additivity for product distributions. We find that these axioms induce sufficient structure to establish continuity in the interior of the probability simplex and meaningful upper and lower bounds, e.g., we find that every relative entropy must lie between the Renyi divergences of order $0$ and $infty$. We further show simple conditions for positive definiteness of such relative entropies and a characterisation in term of a variant of relative trumping. Our main result is a one-to-one correspondence between entropies and relative entropies.
While Shannons mutual information has widespread applications in many disciplines, for practical applications it is often difficult to calculate its value accurately for high-dimensional variables because of the curse of dimensionality. This paper is focused on effective approximation methods for evaluating mutual information in the context of neural population coding. For large but finite neural populations, we derive several information-theoretic asymptotic bounds and approximation formulas that remain valid in high-dimensional spaces. We prove that optimizing the population density distribution based on these approximation formulas is a convex optimization problem which allows efficient numerical solutions. Numerical simulation results confirmed that our asymptotic formulas were highly accurate for approximating mutual information for large neural populations. In special cases, the approximation formulas are exactly equal to the true mutual information. We also discuss techniques of variable transformation and dimensionality reduction to facilitate computation of the approximations.
The characterisation of information processing is an important task in complex systems science. Information dynamics is a quantitative methodology for modelling the intrinsic information processing conducted by a process represented as a time series, but to date has only been formulated in discrete time. Building on previous work which demonstrated how to formulate transfer entropy in continuous time, we give a total account of information processing in this setting, incorporating information storage. We find that a convergent rate of predictive capacity, comprised of the transfer entropy and active information storage, does not exist, arising through divergent rates of active information storage. We identify that active information storage can be decomposed into two separate quantities that characterise predictive capacity stored in a process: active memory utilisation and instantaneous predictive capacity. The latter involves prediction related to path regularity and so solely inherits the divergent properties of the active information storage, whilst the former permits definitions of pathwise and rate quantities. We formulate measures of memory utilisation for jump and neural spiking processes and illustrate measures of information processing in synthetic neural spiking models and coupled Ornstein-Uhlenbeck models. The application to synthetic neural spiking models demonstrates that active memory utilisation for point processes consists of discontinuous jump contributions (at spikes) interrupting a continuously varying contribution (relating to waiting times between spikes), complementing the behaviour previously demonstrated for transfer entropy in these processes.