No Arabic abstract
We introduce a uniform representation of general objects that captures the regularities with respect to their structure. It allows a representation of a general class of objects including geometric patterns and images in a sparse, modular, hierarchical, and recursive manner. The representation can exploit any computable regularity in objects to compactly describe them, while also being capable of representing random objects as raw data. A set of rules uniformly dictates the interpretation of the representation into raw signal, which makes it possible to ask what pattern a given raw signal contains. Also, it allows simple separation of the information that we wish to ignore from that which we measure, by using a set of maps to delineate the a priori parts of the objects, leaving only the information in the structure. Using the representation, we introduce a measure of information in general objects relative to structures defined by the set of maps. We point out that the common prescription of encoding objects by strings to use Kolmogorov complexity is meaningless when, as often is the case, the encoding is not specified in any way other than that it exists. Noting this, we define the measure directly in terms of the structures of the spaces in which the objects reside. As a result, the measure is defined relative to a set of maps that characterize the structures. It turns out that the measure is equivalent to Kolmogorov complexity when it is defined relative to the maps characterizing the structure of natural numbers. Thus, the formulation gives the larger class of objects a meaningful measure of information that generalizes Kolmogorov complexity.
Normalized information distance (NID) uses the theoretical notion of Kolmogorov complexity, which for practical purposes is approximated by the length of the compressed version of the file involved, using a real-world compression program. This practical application is called normalized compression distance and it is trivially computable. It is a parameter-free similarity measure based on compression, and is used in pattern recognition, data mining, phylogeny, clustering, and classification. The complexity properties of its theoretical precursor, the NID, have been open. We show that the NID is neither upper semicomputable nor lower semicomputable.
In the present paper, we propose the model of {it structural information learning machines} (SiLeM for short), leading to a mathematical definition of learning by merging the theories of computation and information. Our model shows that the essence of learning is {it to gain information}, that to gain information is {it to eliminate uncertainty} embedded in a data space, and that to eliminate uncertainty of a data space can be reduced to an optimization problem, that is, an {it information optimization problem}, which can be realized by a general {it encoding tree method}. The principle and criterion of the structural information learning machines are maximization of {it decoding information} from the data points observed together with the relationships among the data points, and semantical {it interpretation} of syntactical {it essential structure}, respectively. A SiLeM machine learns the laws or rules of nature. It observes the data points of real world, builds the {it connections} among the observed data and constructs a {it data space}, for which the principle is to choose the way of connections of data points so that the {it decoding information} of the data space is maximized, finds the {it encoding tree} of the data space that minimizes the dynamical uncertainty of the data space, in which the encoding tree is hence referred to as a {it decoder}, due to the fact that it has already eliminated the maximum amount of uncertainty embedded in the data space, interprets the {it semantics} of the decoder, an encoding tree, to form a {it knowledge tree}, extracts the {it remarkable common features} for both semantical and syntactical features of the modules decoded by a decoder to construct {it trees of abstractions}, providing the foundations for {it intuitive reasoning} in the learning when new data are observed.
Today, even the most compute-and-power constrained robots can measure complex, high data-rate video and LIDAR sensory streams. Often, such robots, ranging from low-power drones to space and subterranean rovers, need to transmit high-bitrate sensory data to a remote compute server if they are uncertain or cannot scalably run complex perception or mapping tasks locally. However, todays representations for sensory data are mostly designed for human, not robotic, perception and thus often waste precious compute or wireless network resources to transmit unimportant parts of a scene that are unnecessary for a high-level robotic task. This paper presents an algorithm to learn task-relevant representations of sensory data that are co-designed with a pre-trained robotic perception models ultimate objective. Our algorithm aggressively compresses robotic sensory data by up to 11x more than competing methods. Further, it achieves high accuracy and robust generalization on diverse tasks including Mars terrain classification with low-power deep learning accelerators, neural motion planning, and environmental timeseries classification.
Mutual Information (MI) plays an important role in representation learning. However, MI is unfortunately intractable in continuous and high-dimensional settings. Recent advances establish tractable and scalable MI estimators to discover useful representation. However, most of the existing methods are not capable of providing an accurate estimation of MI with low-variance when the MI is large. We argue that directly estimating the gradients of MI is more appealing for representation learning than estimating MI in itself. To this end, we propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions. MIGE exhibits a tight and smooth gradient estimation of MI in the high-dimensional and large-MI settings. We expand the applications of MIGE in both unsupervised learning of deep representations based on InfoMax and the Information Bottleneck method. Experimental results have indicated significant performance improvement in learning useful representation.
We introduce a simple logical inference structure we call a $textsf{spanoid}$ (generalizing the notion of a matroid), which captures well-studied problems in several areas. These include combinatorial geometry, algebra (arrangements of hypersurfaces and ideals), statistical physics (bootstrap percolation) and coding theory. We initiate a thorough investigation of spanoids, from computational and structural viewpoints, focusing on parameters relevant to the applications areas above and, in particular, to questions regarding Locally Correctable Codes (LCCs). One central parameter we study is the $textsf{rank}$ of a spanoid, extending the rank of a matroid and related to the dimension of codes. This leads to one main application of our work, establishing the first known barrier to improving the nearly 20-year old bound of Katz-Trevisan (KT) on the dimension of LCCs. On the one hand, we prove that the KT bound (and its more recent refinements) holds for the much more general setting of spanoid rank. On the other hand we show that there exist (random) spanoids whose rank matches these bounds. Thus, to significantly improve the known bounds one must step out of the spanoid framework. Another parameter we explore is the $textsf{functional rank}$ of a spanoid, which captures the possibility of turning a given spanoid into an actual code. The question of the relationship between rank and functional rank is one of the main questions we raise as it may reveal new avenues for constructing new LCCs (perhaps even matching the KT bound). As a first step, we develop an entropy relaxation of functional rank to create a small constant gap and amplify it by tensoring to construct a spanoid whose functional rank is smaller than rank by a polynomial factor. This is evidence that the entropy method we develop can prove polynomially better bounds than KT-type methods on the dimension of LCCs.