In the big data era, massive amount of multimedia data with geo-tags has been generated and collected by mobile smart devices equipped with mobile communications module and position sensor module. This trend has put forward higher request on large-scale of geo-multimedia data retrieval. Spatial similarity join is one of the important problem in the area of spatial database. Previous works focused on textual document with geo-tags, rather than geo-multimedia data such as geo-images. In this paper, we study a novel search problem named spatial visual similarity join (SVS-JOIN for short), which aims to find similar geo-image pairs in both the aspects of geo-location and visual content. We propose the definition of SVS-JOIN at the first time and present how to measure geographical similarity and visual similarity. Then we introduce a baseline inspired by the method for textual similarity join and a extension named SVS-JOIN$_G$ which applies spatial grid strategy to improve the efficiency. To further improve the performance of search, we develop a novel approach called SVS-JOIN$_Q$ which utilizes a quadtree and a global inverted index. Experimental evaluations on real geo-image datasets demonstrate that our solution has a really high performance.