ﻻ يوجد ملخص باللغة العربية
Deep convolutional neural networks have achieved remarkable success in computer vision. However, deep neural networks require large computing resources to achieve high performance. Although depthwise separable convolution can be an efficient module to approximate a standard convolution, it often leads to reduced representational power of networks. In this paper, under budget constraints such as computational cost (MAdds) and the parameter count, we propose a novel basic architectural block, ANTBlock. It boosts the representational power by modeling, in a high dimensional space, interdependency of channels between a depthwise convolution layer and a projection layer in the ANTBlocks. Our experiments show that ANTNet built by a sequence of ANTBlocks, consistently outperforms state-of-the-art low-cost mobile convolutional neural networks across multiple datasets. On CIFAR100, our model achieves 75.7% top-1 accuracy, which is 1.5% higher than MobileNetV2 with 8.3% fewer parameters and 19.6% less computational cost. On ImageNet, our model achieves 72.8% top-1 accuracy, which is 0.8% improvement, with 157.7ms (20% faster) on iPhone 5s over MobileNetV2.
Convolutional neural networks (CNNs) have achieved state-of-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient
Despite the effectiveness of Convolutional Neural Networks (CNNs) for image classification, our understanding of the relationship between shape of convolution kernels and learned representations is limited. In this work, we explore and employ the rel
Deep learning Convolutional Neural Network (CNN) models are powerful classification models but require a large amount of training data. In niche domains such as bird acoustics, it is expensive and difficult to obtain a large number of training sample
We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise gro
Deep Neural Networks (DNNs) have become the de-facto standard in computer vision, as well as in many other pattern recognition tasks. A key drawback of DNNs is that the training phase can be very computationally expensive. Organizations or individual