ﻻ يوجد ملخص باللغة العربية
Recent works in deep-learning have shown that second-order information is beneficial in many computer-vision tasks. Second-order information can be enforced both in the spatial context and the abstract feature dimensions. In this work, we explore two second-order components. One is focused on second-order spatial information to increase the performance of image descriptors, both local and global. It is used to re-weight feature maps, and thus emphasise salient image locations that are subsequently used for description. The second component is concerned with a second-order similarity (SOS) loss, that we extend to global descriptors for image retrieval, and is used to enhance the triplet loss with hard-negative mining. We validate our approach on two different tasks and datasets for image retrieval and image matching. The results show that our two second-order components complement each other, bringing significant performance improvements in both tasks and lead to state-of-the-art results across the public benchmarks. Code available at: http://github.com/tonyngjichun/SOLAR
Deep hashing methods have been proved to be effective for the large-scale medical image search assisting reference-based diagnosis for clinicians. However, when the salient region plays a maximal discriminative role in ophthalmic image, existing deep
Automated crowd counting from images/videos has attracted more attention in recent years because of its wide application in smart cities. But modelling the dense crowd heads is challenging and most of the existing works become less reliable. To obtai
Cross-modal retrieval aims to enable flexible retrieval experience by combining multimedia data such as image, video, text, and audio. One core of unsupervised approaches is to dig the correlations among different object representations to complete s
We address representation learning for large-scale instance-level image retrieval. Apart from backbone, training pipelines and loss functions, popular approaches have focused on different spatial pooling and attention mechanisms, which are at the cor
Image reconstruction and synthesis have witnessed remarkable progress thanks to the development of generative models. Nonetheless, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we sho