ﻻ يوجد ملخص باللغة العربية
Despite their renowned predictive power on i.i.d. data, convolutional neural networks are known to rely more on high-frequency patterns that humans deem superficial than on low-frequency patterns that agree better with intuitions about what constitutes category membership. This paper proposes a method for training robust convolutional networks by penalizing the predictive power of the local representations learned by earlier layers. Intuitively, our networks are forced to discard predictive signals such as color and texture that can be gleaned from local receptive fields and to rely instead on the global structures of the image. Across a battery of synthetic and benchmark domain adaptation tasks, our method confers improved generalization out of the domain. Also, to evaluate cross-domain transfer, we introduce ImageNet-Sketch, a new dataset consisting of sketch-like images, that matches the ImageNet classification validation set in categories and scale.
Despite impressive performance as evaluated on i.i.d. holdout data, deep neural networks depend heavily on superficial statistics of the training data and are liable to break under distribution shift. For example, subtle changes to the background or
Self-supervised speech representations have been shown to be effective in a variety of speech applications. However, existing representation learning methods generally rely on the autoregressive model and/or observed global dependencies while generat
Contrastive learning has delivered impressive results in many audio-visual representation learning scenarios. However, existing approaches optimize for learning either textit{global} representations useful for tasks such as classification, or textit{
The fully convolutional network (FCN) has dominated salient object detection for a long period. However, the locality of CNN requires the model deep enough to have a global receptive field and such a deep model always leads to the loss of local detai
We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of computations similar to an autoencoder. The likelihood under