No Arabic abstract
Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), where various pruning criteria have been proposed to remove the redundant filters. From our comprehensive experiments, we found two blind spots in the study of pruning criteria: (1) Similarity: There are some strong similarities among several primary pruning criteria that are widely cited and compared. According to these criteria, the ranks of filtersImportance Score are almost identical, resulting in similar pruned structures. (2) Applicability: The filtersImportance Score measured by some pruning criteria are too close to distinguish the network redundancy well. In this paper, we analyze these two blind spots on different types of pruning criteria with layer-wise pruning or global pruning. The analyses are based on the empirical experiments and our assumption (Convolutional Weight Distribution Assumption) that the well-trained convolutional filters each layer approximately follow a Gaussian-alike distribution. This assumption has been verified through systematic and extensive statistical tests.
Channel pruning has demonstrated its effectiveness in compressing ConvNets. In many related arts, the importance of an output feature map is only determined by its associated filter. However, these methods ignore a small part of weights in the next layer which disappears as the feature map is removed. They ignore the phenomenon of weight dependency. Besides, many pruning methods use only one criterion for evaluation and find a sweet spot of pruning structure and accuracy in a trial-and-error fashion, which can be time-consuming. In this paper, we proposed a channel pruning algorithm via multi-criteria based on weight dependency, CPMC, which can compress a pre-trained model directly. CPMC defines channel importance in three aspects, including its associated weight value, computational cost, and parameter quantity. According to the phenomenon of weight dependency, CPMC gets channel importance by assessing its associated filter and the corresponding partial weights in the next layer. Then CPMC uses global normalization to achieve cross-layer comparison. Finally, CPMC removes less important channels by global ranking. CPMC can compress various CNN models, including VGGNet, ResNet, and DenseNet on various image classification datasets. Extensive experiments have shown CPMC outperforms the others significantly.
We propose a simple but effective data-driven channel pruning algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations. The proposed approach makes a joint consideration of batch normalization (BN) and rectified linear unit (ReLU) for channel pruning; it estimates how likely the two successive operations deactivate each feature map and prunes the channels with high probabilities. To this end, we learn differentiable masks for individual channels and make soft decisions throughout the optimization procedure, which facilitates to explore larger search space and train more stable networks. The proposed framework enables us to identify compressed models via a joint learning of model parameters and channel pruning without an extra procedure of fine-tuning. We perform extensive experiments and achieve outstanding performance in terms of the accuracy of output networks given the same amount of resources when compared with the state-of-the-art methods.
Deep neural networks (DNNs) although achieving human-level performance in many domains, have very large model size that hinders their broader applications on edge computing devices. Extensive research work have been conducted on DNN model compression or pruning. However, most of the previous work took heuristic approaches. This work proposes a progressive weight pruning approach based on ADMM (Alternating Direction Method of Multipliers), a powerful technique to deal with non-convex optimization problems with potentially combinatorial constraints. Motivated by dynamic programming, the proposed method reaches extremely high pruning rate by using partial prunings with moderate pruning rates. Therefore, it resolves the accuracy degradation and long convergence time problems when pursuing extremely high pruning ratios. It achieves up to 34 times pruning rate for ImageNet dataset and 167 times pruning rate for MNIST dataset, significantly higher than those reached by the literature work. Under the same number of epochs, the proposed method also achieves faster convergence and higher compression rates. The codes and pruned DNN models are released in the link bit.ly/2zxdlss
We present a systematic weight pruning framework of deep neural networks (DNNs) using the alternating direction method of multipliers (ADMM). We first formulate the weight pruning problem of DNNs as a constrained nonconvex optimization problem, and then adopt the ADMM framework for systematic weight pruning. We show that ADMM is highly suitable for weight pruning due to the computational efficiency it offers. We achieve a much higher compression ratio compared with prior work while maintaining the same test accuracy, together with a faster convergence rate. Our models are released at https://github.com/KaiqiZhang/admm-pruning
The sophisticated structure of Convolutional Neural Network (CNN) allows for outstanding performance, but at the cost of intensive computation. As significant redundancies inevitably present in such a structure, many works have been proposed to prune the convolutional filters for computation cost reduction. Although extremely effective, most works are based only on quantitative characteristics of the convolutional filters, and highly overlook the qualitative interpretation of individual filters specific functionality. In this work, we interpreted the functionality and redundancy of the convolutional filters from different perspectives, and proposed a functionality-oriented filter pruning method. With extensive experiment results, we proved the convolutional filters qualitative significance regardless of magnitude, demonstrated significant neural network redundancy due to repetitive filter functions, and analyzed the filter functionality defection under inappropriate retraining process. Such an interpretable pruning approach not only offers outstanding computation cost optimization over previous filter pruning methods, but also interprets filter pruning process.