ﻻ يوجد ملخص باللغة العربية
In real applications, different computation-resource devices need different-depth networks (e.g., ResNet-18/34/50) with high-accuracy. Usually, existing methods either design multiple networks and train them independently, or construct depth-level/width-level dynamic neural networks which is hard to prove the accuracy of each sub-net. In this article, we propose an elegant Depth-Level Dynamic Neural Network (DDNN) integrated different-depth sub-nets of similar architectures. To improve the generalization of sub-nets, we design the Embedded-Knowledge-Distillation (EKD) training mechanism for the DDNN to implement knowledge transfer from the teacher (full-net) to multiple students (sub-nets). Specifically, the Kullback-Leibler (KL) divergence is introduced to constrain the posterior class probability consistency between full-net and sub-nets, and self-attention distillation on the same resolution feature of different depth is addressed to drive more abundant feature representations of sub-nets. Thus, we can obtain multiple high-accuracy sub-nets simultaneously in a DDNN via the online knowledge distillation in each training iteration without extra computation cost. Extensive experiments on CIFAR-10/100, and ImageNet datasets demonstrate that sub-nets in DDNN with EKD training achieve better performance than individually training networks while preserving the original performance of full-nets.
Neural Architecture Search (NAS), aiming at automatically designing network architectures by machines, is hoped and expected to bring about a new revolution in machine learning. Despite these high expectation, the effectiveness and efficiency of exis
The advanced performance of depth estimation is achieved by the employment of large and complex neural networks. While the performance has still been continuously improved, we argue that the depth estimation has to be accurate and efficient. Its a pr
Recently, distillation approaches are suggested to extract general knowledge from a teacher network to guide a student network. Most of the existing methods transfer knowledge from the teacher network to the student via feeding the sequence of random
Knowledge distillation~(KD) is an effective learning paradigm for improving the performance of lightweight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only
Multi-lingual script identification is a difficult task consisting of different language with complex backgrounds in scene text images. According to the current research scenario, deep neural networks are employed as teacher models to train a smaller