ﻻ يوجد ملخص باللغة العربية
Using weight decay to penalize the L2 norms of weights in neural networks has been a standard training practice to regularize the complexity of networks. In this paper, we show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with positively homogeneous activation functions, such as linear, ReLU and max-pooling functions. As a result of homogeneity, functions specified by the networks are invariant to the shifting of weight scales between layers. The ineffective regularizers are sensitive to such shifting and thus poorly regularize the model capacity, leading to overfitting. To address this shortcoming, we propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network. The derived regularizer is an upper bound for the input gradient of the network so minimizing the improved regularizer also benefits the adversarial robustness. Residual connections are also considered and we show that our regularizer also forms an upper bound to input gradients of such a residual network. We demonstrate the efficacy of our proposed regularizer on various datasets and neural network architectures at improving generalization and adversarial robustness.
Existing generalization measures that aim to capture a models simplicity based on parameter counts or norms fail to explain generalization in overparameterized deep neural networks. In this paper, we introduce a new, theoretically motivated measure o
We propose emph{MaxUp}, an embarrassingly simple, highly effective technique for improving the generalization performance of machine learning models, especially deep neural networks. The idea is to generate a set of augmented data with some random pe
We investigate the power of censoring techniques, first developed for learning {em fair representations}, to address domain generalization. We examine {em adversarial} censoring techniques for learning invariant representations from multiple studies
Adversarial robustness has emerged as a desirable property for neural networks. Prior work shows that robust networks perform well in some out-of-distribution generalization tasks, such as transfer learning and outlier detection. We uncover a differe
Recent research has highlighted the role of relational inductive biases in building learning agents that can generalize and reason in a compositional manner. However, while relational learning algorithms such as graph neural networks (GNNs) show prom