ﻻ يوجد ملخص باللغة العربية
We examine the influence of input data representations on learning complexity. For learning, we posit that each model implicitly uses a candidate model distribution for unexplained variations in the data, its noise model. If the model distribution is not well aligned to the true distribution, then even relevant variations will be treated as noise. Crucially however, the alignment of model and true distribution can be changed, albeit implicitly, by changing data representations. Better representations can better align the model to the true distribution, making it easier to approximate the input-output relationship in the data without discarding useful data variations. To quantify this alignment effect of data representations on the difficulty of a learning task, we make use of an existing task complexity score and show its connection to the representation-dependent information coding length of the input. Empirically we extract the necessary statistics from a linear regression approximation and show that these are sufficient to predict relative learning performance outcomes of different data representations and neural network types obtained when utilizing an extensive neural network architecture search. We conclude that to ensure better learning outcomes, representations may need to be tailored to both task and model to align with the implicit distribution of model and task.
We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement
Recent studies on catastrophic forgetting during sequential learning typically focus on fixing the accuracy of the predictions for a previously learned task. In this paper we argue that the outputs of neural networks are subject to rapid changes when
We consider the problem of evaluating representations of data for use in solving a downstream task. We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on
Knowledge Transfer (KT) techniques tackle the problem of transferring the knowledge from a large and complex neural network into a smaller and faster one. However, existing KT methods are tailored towards classification tasks and they cannot be used
A central challenge in developing versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks. Despite the ubiquity of catastrophic forgetting, there i