Towards Improved and Interpretable Deep Metric Learning via Attentive Grouping


Abstract in English

Grouping has been commonly used in deep metric learning for computing diverse features. However, current methods are prone to overfitting and lack interpretability. In this work, we propose an improved and interpretable grouping method to be integrated flexibly with any metric learning framework. Our method is based on the attention mechanism with a learnable query for each group. The query is fully trainable and can capture group-specific information when combined with the diversity loss. An appealing property of our method is that it naturally lends itself interpretability. The attention scores between the learnable query and each spatial position can be interpreted as the importance of that position. We formally show that our proposed grouping method is invariant to spatial permutations of features. When used as a module in convolutional neural networks, our method leads to translational invariance. We conduct comprehensive experiments to evaluate our method. Our quantitative results indicate that the proposed method outperforms prior methods consistently and significantly across different datasets, evaluation metrics, base models, and loss functions. For the first time to the best of our knowledge, our interpretation results clearly demonstrate that the proposed method enables the learning of distinct and diverse features across groups. The code is available on https://github.com/XinyiXuXD/DGML-master.

Download