No Arabic abstract
It is commonly believed that the hidden layers of deep neural networks (DNNs) attempt to extract informative features for learning tasks. In this paper, we formalize this intuition by showing that the features extracted by DNN coincide with the result of an optimization problem, which we call the `universal feature selection problem, in a local analysis regime. We interpret the weights training in DNN as the projection of feature functions between feature spaces, specified by the network structure. Our formulation has direct operational meaning in terms of the performance for inference tasks, and gives interpretations to the internal computation results of DNNs. Results of numerical experiments are provided to support the analysis.
While Shannons mutual information has widespread applications in many disciplines, for practical applications it is often difficult to calculate its value accurately for high-dimensional variables because of the curse of dimensionality. This paper is focused on effective approximation methods for evaluating mutual information in the context of neural population coding. For large but finite neural populations, we derive several information-theoretic asymptotic bounds and approximation formulas that remain valid in high-dimensional spaces. We prove that optimizing the population density distribution based on these approximation formulas is a convex optimization problem which allows efficient numerical solutions. Numerical simulation results confirmed that our asymptotic formulas were highly accurate for approximating mutual information for large neural populations. In special cases, the approximation formulas are exactly equal to the true mutual information. We also discuss techniques of variable transformation and dimensionality reduction to facilitate computation of the approximations.
We have developed an efficient information-maximization method for computing the optimal shapes of tuning curves of sensory neurons by optimizing the parameters of the underlying feedforward network model. When applied to the problem of population coding of visual motion with multiple directions, our method yields several types of tuning curves with both symmetric and asymmetric shapes that resemble what have been found in the visual cortex. Our result suggests that the diversity or heterogeneity of tuning curve shapes as observed in neurophysiological experiment might actually constitute an optimal population representation of visual motions with multiple components.
This paper summarizes recent contributions of the authors and their co-workers in the area of information-theoretic security.
We consider a slotted wireless network in an infrastructure setup with a base station (or an access point) and N users. The wireless channel gain between the base station and the users is assumed to be i.i.d., and the base station seeks to schedule the user with the highest channel gain in every slot (opportunistic scheduling). We assume that the identity of the user with the highest channel gain is resolved using a series of contention slots and with feedback from the base station. In this setup, we formulate the contention resolution problem for opportunistic scheduling as identifying a random threshold (channel gain) that separates the best channel from the other samples. We show that the average delay to resolve contention is related to the entropy of the random threshold. We illustrate our formulation by studying the opportunistic splitting algorithm (OSA) for i.i.d. wireless channel [9]. We note that the thresholds of OSA correspond to a maximal probability allocation scheme. We conjecture that maximal probability allocation is an entropy minimizing strategy and a delay minimizing strategy for i.i.d. wireless channel. Finally, we discuss the applicability of this framework for few other network scenarios.
As network research becomes more sophisticated, it is more common than ever for researchers to find themselves not studying a single network but needing to analyze sets of networks. An important task when working with sets of networks is network comparison, developing a similarity or distance measure between networks so that meaningful comparisons can be drawn. The best means to accomplish this task remains an open area of research. Here we introduce a new measure to compare networks, the Network Portrait Divergence, that is mathematically principled, incorporates the topological characteristics of networks at all structural scales, and is general-purpose and applicable to all types of networks. An important feature of our measure that enables many of its useful properties is that it is based on a graph invariant, the network portrait. We test our measure on both synthetic graphs and real world networks taken from protein interaction data, neuroscience, and computational social science applications. The Network Portrait Divergence reveals important characteristics of multilayer and temporal networks extracted from data.