ﻻ يوجد ملخص باللغة العربية
In this paper, we investigate the cross-media retrieval between images and text, i.e., using image to search text (I2T) and using text to search images (T2I). Existing cross-media retrieval methods usually learn one couple of projections, by which the original features of images and text can be projected into a common latent space to measure the content similarity. However, using the same projections for the two different retrieval tasks (I2T and T2I) may lead to a tradeoff between their respective performances, rather than their best performances. Different from previous works, we propose a modality-dependent cross-media retrieval (MDCR) model, where two couples of projections are learned for different cross-media retrieval tasks instead of one couple of projections. Specifically, by jointly optimizing the correlation between images and text and the linear regression from one modal space (image or text) to the semantic space, two couples of mappings are learned to project images and text from their original feature spaces into two common latent subspaces (one for I2T and the other for T2I). Extensive experiments show the superiority of the proposed MDCR compared with other methods. In particular, based the 4,096 dimensional convolutional neural network (CNN) visual feature and 100 dimensional LDA textual feature, the mAP of the proposed method achieves 41.5%, which is a new state-of-the-art performance on the Wikipedia dataset.
This paper presents a three-tier modality alignment approach to learning text-image joint embedding, coined as JEMA, for cross-modal retrieval of cooking recipes and food images. The first tier improves recipe text embedding by optimizing the LSTM ne
In recent years, cross-media hashing technique has attracted increasing attention for its high computation efficiency and low storage cost. However, the existing approaches still have some limitations, which need to be explored. 1) A fixed hash lengt
With the advancement in technology and the expansion of broadcasting, cross-media retrieval has gained much attention. It plays a significant role in big data applications and consists in searching and finding data from different types of media. In t
It is widely acknowledged that learning joint embeddings of recipes with images is challenging due to the diverse composition and deformation of ingredients in cooking procedures. We present a Multi-modal Semantics enhanced Joint Embedding approach (
Dense retrieval has shown great success in passage ranking in English. However, its effectiveness in document retrieval for non-English languages remains unexplored due to the limitation in training resources. In this work, we explore different trans