ﻻ يوجد ملخص باللغة العربية
In this work, we consider transferring the structure information from large networks to compact ones for dense prediction tasks in computer vision. Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately, leading to sub-optimal performance. Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem. Specifically, we study two structured distillation schemes: i) pair-wise distillation that distills the pair-wise similarities by building a static graph; and ii) holistic distillation that uses adversarial training to distill holistic knowledge. The effectiveness of our knowledge distillation approaches is demonstrated by experiments on three dense prediction tasks: semantic segmentation, depth estimation and object detection. Code is available at: https://git.io/StructKD
Large convolutional neural network models have recently demonstrated impressive performance on video attention prediction. Conventionally, these models are with intensive computation and large memory. To address these issues, we design an extremely l
Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level. However, they usually ignore the correlation between multiple instances, which is also valuable for knowledge transfer. In
Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one. Most existing approaches enhance the student model by utilizing the similarity information between
Knowledge distillation is a critical technique to transfer knowledge between models, typically from a large model (the teacher) to a more fine-grained one (the student). The objective function of knowledge distillation is typically the cross-entropy
Despite the recent works on knowledge distillation (KD) have achieved a further improvement through elaborately modeling the decision boundary as the posterior knowledge, their performance is still dependent on the hypothesis that the target network