Do you want to publish a course? Click here

Embedded Knowledge Distillation in Depth-Level Dynamic Neural Network

83   0   0.0 ( 0 )
 Added by Shuchang Lyu
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

In real applications, different computation-resource devices need different-depth networks (e.g., ResNet-18/34/50) with high-accuracy. Usually, existing methods either design multiple networks and train them independently, or construct depth-level/width-level dynamic neural networks which is hard to prove the accuracy of each sub-net. In this article, we propose an elegant Depth-Level Dynamic Neural Network (DDNN) integrated different-depth sub-nets of similar architectures. To improve the generalization of sub-nets, we design the Embedded-Knowledge-Distillation (EKD) training mechanism for the DDNN to implement knowledge transfer from the teacher (full-net) to multiple students (sub-nets). Specifically, the Kullback-Leibler (KL) divergence is introduced to constrain the posterior class probability consistency between full-net and sub-nets, and self-attention distillation on the same resolution feature of different depth is addressed to drive more abundant feature representations of sub-nets. Thus, we can obtain multiple high-accuracy sub-nets simultaneously in a DDNN via the online knowledge distillation in each training iteration without extra computation cost. Extensive experiments on CIFAR-10/100, and ImageNet datasets demonstrate that sub-nets in DDNN with EKD training achieve better performance than individually training networks while preserving the original performance of full-nets.



rate research

Read More

Neural Architecture Search (NAS), aiming at automatically designing network architectures by machines, is hoped and expected to bring about a new revolution in machine learning. Despite these high expectation, the effectiveness and efficiency of existing NAS solutions are unclear, with some recent works going so far as to suggest that many existing NAS solutions are no better than random architecture selection. The inefficiency of NAS solutions may be attributed to inaccurate architecture evaluation. Specifically, to speed up NAS, recent works have proposed under-training different candidate architectures in a large search space concurrently by using shared network parameters; however, this has resulted in incorrect architecture ratings and furthered the ineffectiveness of NAS. In this work, we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates. Thanks to the block-wise search, we can also evaluate all of the candidate architectures within a block. Moreover, we find that the knowledge of a network model lies not only in the network parameters but also in the network architecture. Therefore, we propose to distill the neural architecture (DNA) knowledge from a teacher model as the supervision to guide our block-wise architecture search, which significantly improves the effectiveness of NAS. Remarkably, the capacity of our searched architecture has exceeded the teacher model, demonstrating the practicability and scalability of our method. Finally, our method achieves a state-of-the-art 78.4% top-1 accuracy on ImageNet in a mobile setting, which is about a 2.1% gain over EfficientNet-B0. All of our searched models along with the evaluation code are available online.
The advanced performance of depth estimation is achieved by the employment of large and complex neural networks. While the performance has still been continuously improved, we argue that the depth estimation has to be accurate and efficient. Its a preliminary requirement for real-world applications. However, fast depth estimation tends to lower the performance as the trade-off between the models capacity and accuracy. In this paper, we attempt to archive highly accurate depth estimation with a light-weight network. To this end, we first introduce a compact network that can estimate a depth map in real-time. We then technically show two complementary and necessary strategies to improve the performance of the light-weight network. As the number of real-world scenes is infinite, the first is the employment of auxiliary data that increases the diversity of training data. The second is the use of knowledge distillation to further boost the performance. Through extensive and rigorous experiments, we show that our method outperforms previous light-weight methods in terms of inference accuracy, computational efficiency and generalization. We can achieve comparable performance compared to state-of-the-of-art methods with only 1% parameters, on the other hand, our method outperforms other light-weight methods by a significant margin.
110 - Haoran Zhao , Xin Sun , Junyu Dong 2021
Recently, distillation approaches are suggested to extract general knowledge from a teacher network to guide a student network. Most of the existing methods transfer knowledge from the teacher network to the student via feeding the sequence of random mini-batches sampled uniformly from the data. Instead, we argue that the compact student network should be guided gradually using samples ordered in a meaningful sequence. Thus, it can bridge the gap of feature representation between the teacher and student network step by step. In this work, we provide a curriculum learning knowledge distillation framework via instance-level sequence learning. It employs the student network of the early epoch as a snapshot to create a curriculum for the student networks next training phase. We carry out extensive experiments on CIFAR-10, CIFAR-100, SVHN and CINIC-10 datasets. Compared with several state-of-the-art methods, our framework achieves the best performance with fewer iterations.
175 - Yuang Liu , Wei Zhang , Jun Wang 2021
Knowledge distillation~(KD) is an effective learning paradigm for improving the performance of lightweight student networks by utilizing additional supervision knowledge distilled from teacher networks. Most pioneering studies either learn from only a single teacher in their distillation learning methods, neglecting the potential that a student can learn from multiple teachers simultaneously, or simply treat each teacher to be equally important, unable to reveal the different importance of teachers for specific examples. To bridge this gap, we propose a novel adaptive multi-teacher multi-level knowledge distillation learning framework~(AMTML-KD), which consists two novel insights: (i) associating each teacher with a latent representation to adaptively learn instance-level teacher importance weights which are leveraged for acquiring integrated soft-targets~(high-level knowledge) and (ii) enabling the intermediate-level hints~(intermediate-level knowledge) to be gathered from multiple teachers by the proposed multi-group hint strategy. As such, a student model can learn multi-level knowledge from multiple teachers through AMTML-KD. Extensive results on publicly available datasets demonstrate the proposed learning framework ensures student to achieve improved performance than strong competitors.
Multi-lingual script identification is a difficult task consisting of different language with complex backgrounds in scene text images. According to the current research scenario, deep neural networks are employed as teacher models to train a smaller student network by utilizing the teacher models predictions. This process is known as dark knowledge transfer. It has been quite successful in many domains where the final result obtained is unachievable through directly training the student network with a simple architecture. In this paper, we explore dark knowledge transfer approach using long short-term memory(LSTM) and CNN based assistant model and various deep neural networks as the teacher model, with a simple CNN based student network, in this domain of multi-script identification from natural scene text images. We explore the performance of different teacher models and their ability to transfer knowledge to a student network. Although the small student networks limited size, our approach obtains satisfactory results on a well-known script identification dataset CVSI-2015.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا