ﻻ يوجد ملخص باللغة العربية
Due to the rapid development of mobile Internet techniques, cloud computation and popularity of online social networking and location-based services, massive amount of multimedia data with geographical information is generated and uploaded to the Internet. In this paper, we propose a novel type of cross-modal multimedia retrieval called geo-multimedia cross-modal retrieval which aims to search out a set of geo-multimedia objects based on geographical distance proximity and semantic similarity between different modalities. Previous studies for cross-modal retrieval and spatial keyword search cannot address this problem effectively because they do not consider multimedia data with geo-tags and do not focus on this type of query. In order to address this problem efficiently, we present the definition of $k$NN geo-multimedia cross-modal query at the first time and introduce relevant conceptions such as cross-modal semantic representation space. To bridge the semantic gap between different modalities, we propose a method named cross-modal semantic matching which contains two important component, i.e., CorrProj and LogsTran, which aims to construct a common semantic representation space for cross-modal semantic similarity measurement. Besides, we designed a framework based on deep learning techniques to implement common semantic representation space construction. In addition, a novel hybrid indexing structure named GMR-Tree combining geo-multimedia data and R-Tree is presented and a efficient $k$NN search algorithm called $k$GMCMS is designed. Comprehensive experimental evaluation on real and synthetic dataset clearly demonstrates that our solution outperforms the-state-of-the-art methods.
With the proliferation of online social networking services and mobile smart devices equipped with mobile communications module and position sensor module, massive amount of multimedia data has been collected, stored and shared. This trend has put fo
This paper aims to solve the problem of large-scale video retrieval by a query image. Firstly, we define the problem of top-$k$ image to video query. Then, we combine the merits of convolutional neural networks(CNN for short) and Bag of Visual Word(B
This paper proposes a novel energy-efficient multimedia delivery system called EStreamer. First, we study the relationship between buffer size at the client, burst-shaped TCP-based multimedia traffic, and energy consumption of wireless network interf
With the vigorous development of multimedia equipment and applications, efficient retrieval of large-scale multi-modal data has become a trendy research topic. Thereinto, hashing has become a prevalent choice due to its retrieval efficiency and low s
We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated e