ﻻ يوجد ملخص باللغة العربية
Quantifying the information content in a neural network model is essentially estimating the models Kolmogorov complexity. Recent success of prequential coding on neural networks points to a promising path of deriving an efficient description length of a model. We propose a practical measure of the generalizable information in a neural network model based on prequential coding, which we term Information Transfer ($L_{IT}$). Theoretically, $L_{IT}$ is an estimation of the generalizable part of a models information content. In experiments, we show that $L_{IT}$ is consistently correlated with generalizable information and can be used as a measure of patterns or knowledge in a model or a dataset. Consequently, $L_{IT}$ can serve as a useful analysis tool in deep learning. In this paper, we apply $L_{IT}$ to compare and dissect information in datasets, evaluate representation models in transfer learning, and analyze catastrophic forgetting and continual learning algorithms. $L_{IT}$ provides an information perspective which helps us discover new insights into neural network learning.
Graph neural networks (GNNs) have been shown with superior performance in various applications, but training dedicated GNNs can be costly for large-scale graphs. Some recent work started to study the pre-training of GNNs. However, none of them provid
We study the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises
Parameters in deep neural networks which are trained on large-scale databases can generalize across multiple domains, which is referred as transferability. Unfortunately, the transferability is usually defined as discrete states and it differs with d
Graph Neural Networks (GNNs) achieve an impressive performance on structured graphs by recursively updating the representation vector of each node based on its neighbors, during which parameterized transformation matrices should be learned for the no
Compact neural networks are essential for affordable and power efficient deep learning solutions. Binary Neural Networks (BNNs) take compactification to the extreme by constraining both weights and activations to two levels, ${+1, -1}$. However, trai