TS-Net: Combining modality specific and common features for multimodal patch matching


Abstract in English

Multimodal patch matching addresses the problem of finding the correspondences between image patches from two different modalities, e.g. RGB vs sketch or RGB vs near-infrared. The comparison of patches of different modalities can be done by discovering the information common to both modalities (Siamese like approaches) or the modality-specific information (Pseudo-Siamese like approaches). We observed that none of these two scenarios is optimal. This motivates us to propose a three-stream architecture, dubbed as TS-Net, combining the benefits of the two. In addition, we show that adding extra constraints in the intermediate layers of such networks further boosts the performance. Experimentations on three multimodal datasets show significant performance gains in comparison with Siamese and Pseudo-Siamese networks.

Download