ترغب بنشر مسار تعليمي؟ اضغط هنا

How Much Position Information Do Convolutional Neural Networks Encode?

122   0   0.0 ( 0 )
 نشر من قبل Md Amirul Islam
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent. An implication of this is that a filter may know what it is looking at, but not where it is positioned in the image. Information concerning absolute position is inherently useful, and it is reasonable to assume that deep CNNs may implicitly learn to encode this information if there is a means to do so. In this paper, we test this hypothesis revealing the surprising degree of absolute position information that is encoded in commonly used neural networks. A comprehensive set of experiments show the validity of this hypothesis and shed light on how and where this information is represented while offering clues to where positional information is derived from in deep CNNs.

قيم البحث

اقرأ أيضاً

104 - Eddie Yan , Yanping Huang 2020
Data augmentations are important ingredients in the recipe for training robust neural networks, especially in computer vision. A fundamental question is whether neural network features encode data augmentation transformations. To answer this question , we introduce a systematic approach to investigate which layers of neural networks are the most predictive of augmentation transformations. Our approach uses features in pre-trained vision models with minimal additional processing to predict common properties transformed by augmentation (scale, aspect ratio, hue, saturation, contrast, and brightness). Surprisingly, neural network features not only predict data augmentation transformations, but they predict many transformations with high accuracy. After validating that neural networks encode features corresponding to augmentation transformations, we show that these features are encoded in the early layers of modern CNNs, though the augmentation signal fades in deeper layers.
Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a ce ntral question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual representations? Analyzing the internal representation structure of ViTs and CNNs on image classification benchmarks, we find striking differences between the two architectures, such as ViT having more uniform representations across all layers. We explore how these differences arise, finding crucial roles played by self-attention, which enables early aggregation of global information, and ViT residual connections, which strongly propagate features from lower to higher layers. We study the ramifications for spatial localization, demonstrating ViTs successfully preserve input spatial information, with noticeable effects from different classification methods. Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.
We propose contextual convolution (CoConv) for visual recognition. CoConv is a direct replacement of the standard convolution, which is the core component of convolutional neural networks. CoConv is implicitly equipped with the capability of incorpor ating contextual information while maintaining a similar number of parameters and computational cost compared to the standard convolution. CoConv is inspired by neuroscience studies indicating that (i) neurons, even from the primary visual cortex (V1 area), are involved in detection of contextual cues and that (ii) the activity of a visual neuron can be influenced by the stimuli placed entirely outside of its theoretical receptive field. On the one hand, we integrate CoConv in the widely-used residual networks and show improved recognition performance over baselines on the core tasks and benchmarks for visual recognition, namely image classification on the ImageNet data set and object detection on the MS COCO data set. On the other hand, we introduce CoConv in the generator of a state-of-the-art Generative Adversarial Network, showing improved generative results on CIFAR-10 and CelebA. Our code is available at https://github.com/iduta/coconv.
Convolutional Neural Networks (CNNs) have been proven to be extremely successful at solving computer vision tasks. State-of-the-art methods favor such deep network architectures for its accuracy performance, with the cost of having massive number of parameters and high weights redundancy. Previous works have studied how to prune such CNNs weights. In this paper, we go to another extreme and analyze the performance of a network stacked with a single convolution kernel across layers, as well as other weights sharing techniques. We name it Deep Anchored Convolutional Neural Network (DACNN). Sharing the same kernel weights across layers allows to reduce the model size tremendously, more precisely, the network is compressed in memory by a factor of L, where L is the desired depth of the network, disregarding the fully connected layer for prediction. The number of parameters in DACNN barely increases as the network grows deeper, which allows us to build deep DACNNs without any concern about memory costs. We also introduce a partial shared weights network (DACNN-mix) as well as an easy-plug-in module, coined regulators, to boost the performance of our architecture. We validated our idea on 3 datasets: CIFAR-10, CIFAR-100 and SVHN. Our results show that we can save massive amounts of memory with our model, while maintaining a high accuracy performance.
In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networ ks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on different aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural networks in computer vision, speech and natural language processing.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا