ﻻ يوجد ملخص باللغة العربية
Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing distillation formulations summarize the ensemble by capturing its average predictions. As a result, the diversity of the ensemble predictions, stemming from each member, is lost. Thus, the distilled model cannot provide a measure of uncertainty comparable to that of the original ensemble. To retain more faithfully the diversity of the ensemble, we propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra. The shared body network learns a joint feature representation that enables each head to capture the predictive behavior of each ensemble member. We demonstrate that with a slight increase in parameter count, Hydra improves distillation performance on classification and regression settings while capturing the uncertainty behavior of the original ensemble over both in-domain and out-of-distribution tasks.
Though deep neural networks have achieved significant progress on various tasks, often enhanced by model ensemble, existing high-performance models can be vulnerable to adversarial attacks. Many efforts have been devoted to enhancing the robustness o
Ensemble learning is a methodology that integrates multiple DNN learners for improving prediction performance of individual learners. Diversity is greater when the errors of the ensemble prediction is more uniformly distributed. Greater diversity is
Predicting user positive response (e.g., purchases and clicks) probability is a critical task in Web applications. To identify predictive features from raw data, the state-of-the-art extreme deep factorization machine (xDeepFM) model introduces a com
Adversarial Transferability is an intriguing property of adversarial examples -- a perturbation that is crafted against one model is also effective against another model, which may arise from a different model family or training process. To better pr
We formally study how ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model using knowledge distillation. We consider the challenging case where the ensemble is s