ترغب بنشر مسار تعليمي؟ اضغط هنا

Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval

152   0   0.0 ( 0 )
 نشر من قبل Tao Yao
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Supervised cross-modal hashing has gained increasing research interest on large-scale retrieval task owning to its satisfactory performance and efficiency. However, it still has some challenging issues to be further studied: 1) most of them fail to well preserve the semantic correlations in hash codes because of the large heterogenous gap; 2) most of them relax the discrete constraint on hash codes, leading to large quantization error and consequent low performance; 3) most of them suffer from relatively high memory cost and computational complexity during training procedure, which makes them unscalable. In this paper, to address above issues, we propose a supervised cross-modal hashing method based on matrix factorization dubbed Efficient Discrete Supervised Hashing (EDSH). Specifically, collective matrix factorization on heterogenous features and semantic embedding with class labels are seamlessly integrated to learn hash codes. Therefore, the feature based similarities and semantic correlations can be both preserved in hash codes, which makes the learned hash codes more discriminative. Then an efficient discrete optimal algorithm is proposed to handle the scalable issue. Instead of learning hash codes bit-by-bit, hash codes matrix can be obtained directly which is more efficient. Extensive experimental results on three public real-world datasets demonstrate that EDSH produces a superior performance in both accuracy and scalability over some existing cross-modal hashing methods.



قيم البحث

اقرأ أيضاً

Due to its low storage cost and fast query speed, hashing has been widely used in large-scale image retrieval tasks. Hash bucket search returns data points within a given Hamming radius to each query, which can enable search at a constant or sub-line ar time cost. However, existing hashing methods cannot achieve satisfactory retrieval performance for hash bucket search in complex scenarios, since they learn only one hash code for each image. More specifically, by using one hash code to represent one image, existing methods might fail to put similar image pairs to the buckets with a small Hamming distance to the query when the semantic information of images is complex. As a result, a large number of hash buckets need to be visited for retrieving similar images, based on the learned codes. This will deteriorate the efficiency of hash bucket search. In this paper, we propose a novel hashing framework, called multiple code hashing (MCH), to improve the performance of hash bucket search. The main idea of MCH is to learn multiple hash codes for each image, with each code representing a different region of the image. Furthermore, we propose a deep reinforcement learning algorithm to learn the parameters in MCH. To the best of our knowledge, this is the first work that proposes to learn multiple hash codes for each image in image retrieval. Experiments demonstrate that MCH can achieve a significant improvement in hash bucket search, compared with existing methods that learn only one hash code for each image.
123 - Qing-Yuan Jiang , Wu-Jun Li 2017
Due to its storage and retrieval efficiency, cross-modal hashing~(CMH) has been widely used for cross-modal similarity search in multimedia applications. According to the training strategy, existing CMH methods can be mainly divided into two categori es: relaxation-based continuous methods and discrete methods. In general, the training of relaxation-based continuous methods is faster than discrete methods, but the accuracy of relaxation-based continuous methods is not satisfactory. On the contrary, the accuracy of discrete methods is typically better than relaxation-based continuous methods, but the training of discrete methods is time-consuming. In this paper, we propose a novel CMH method, called discrete latent factor model based cross-modal hashing~(DLFH), for cross modal similarity search. DLFH is a discrete method which can directly learn the binary hash codes for CMH. At the same time, the training of DLFH is efficient. Experiments on real datasets show that DLFH can achieve significantly better accuracy than existing methods, and the training time of DLFH is comparable to that of relaxation-based continuous methods which are much faster than existing discrete methods.
175 - Xuanwu Liu , Jun Wang , Guoxian Yu 2019
Hashing has been widely adopted for large-scale data retrieval in many domains, due to its low storage cost and high retrieval speed. Existing cross-modal hashing methods optimistically assume that the correspondence between training samples across m odalities are readily available. This assumption is unrealistic in practical applications. In addition, these methods generally require the same number of samples across different modalities, which restricts their flexibility. We propose a flexible cross-modal hashing approach (Flex-CMH) to learn effective hashing codes from weakly-paired data, whose correspondence across modalities are partially (or even totally) unknown. FlexCMH first introduces a clustering-based matching strategy to explore the local structure of each cluster, and thus to find the potential correspondence between clusters (and samples therein) across modalities. To reduce the impact of an incomplete correspondence, it jointly optimizes in a unified objective function the potential correspondence, the cross-modal hashing functions derived from the correspondence, and a hashing quantitative loss. An alternative optimization technique is also proposed to coordinate the correspondence and hash functions, and to reinforce the reciprocal effects of the two objectives. Experiments on publicly multi-modal datasets show that FlexCMH achieves significantly better results than state-of-the-art methods, and it indeed offers a high degree of flexibility for practical cross-modal hashing tasks.
70 - Xuanwu Liu , Zhao Li , Jun Wang 2019
Hashing has been widely studied for big data retrieval due to its low storage cost and fast query speed. Zero-shot hashing (ZSH) aims to learn a hashing model that is trained using only samples from seen categories, but can generalize well to samples of unseen categories. ZSH generally uses category attributes to seek a semantic embedding space to transfer knowledge from seen categories to unseen ones. As a result, it may perform poorly when labeled data are insufficient. ZSH methods are mainly designed for single-modality data, which prevents their application to the widely spread multi-modal data. On the other hand, existing cross-modal hashing solutions assume that all the modalities share the same category labels, while in practice the labels of different data modalities may be different. To address these issues, we propose a general Cross-modal Zero-shot Hashing (CZHash) solution to effectively leverage unlabeled and labeled multi-modality data with different label spaces. CZHash first quantifies the composite similarity between instances using label and feature information. It then defines an objective function to achieve deep feature learning compatible with the composite similarity preserving, category attribute space learning, and hashing coding function learning. CZHash further introduces an alternative optimization procedure to jointly optimize these learning objectives. Experiments on benchmark multi-modal datasets show that CZHash significantly outperforms related representative hashing approaches both on effectiveness and adaptability.
133 - Chunbin Gu , Jiajun Bu , Xixi Zhou 2021
In this paper, we study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image. Prior work usually uses a three-stage strategy to tackle this ta sk: 1) extract the features of the inputs; 2) fuse the feature of the source image and its modified text to obtain fusion feature; 3) learn a similarity metric between the desired image and the source image + modified text by using deep metric learning. Since classical image/text encoders can learn the useful representation and common pair-based loss functions of distance metric learning are enough for cross-modal retrieval, people usually improve retrieval accuracy by designing new fusion networks. However, these methods do not successfully handle the modality gap caused by the inconsistent distribution and representation of the features of different modalities, which greatly influences the feature fusion and similarity learning. To alleviate this problem, we adopt the contrastive self-supervised learning method Deep InforMax (DIM) to our approach to bridge this gap by enhancing the dependence between the text, the image, and their fusion. Specifically, our method narrows the modality gap between the text modality and the image modality by maximizing mutual information between their not exactly semantically identical representation. Moreover, we seek an effective common subspace for the semantically same fusion feature and desired images feature by utilizing Deep InforMax between the low-level layer of the image encoder and the high-level layer of the fusion network. Extensive experiments on three large-scale benchmark datasets show that we have bridged the modality gap between different modalities and achieve state-of-the-art retrieval performance.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا