ﻻ يوجد ملخص باللغة العربية
Recently, the methods based on Convolutional Neural Networks (CNNs) have gained popularity in the field of visual place recognition (VPR). In particular, the features from the middle layers of CNNs are more robust to drastic appearance changes than handcrafted features and high-layer features. Unfortunately, the holistic mid-layer features lack robustness to large viewpoint changes. Here we split the holistic mid-layer features into local features, and propose an adaptive dynamic time warping (DTW) algorithm to align local features from the spatial domain while measuring the distance between two images. This realizes viewpoint-invariant and condition-invariant place recognition. Meanwhile, a local matching DTW (LM-DTW) algorithm is applied to perform image sequence matching based on temporal alignment, which achieves further improvements and ensures linear time complexity. We perform extensive experiments on five representative VPR datasets. The results show that the proposed method significantly improves the CNN-based methods. Moreover, our method outperforms several state-of-the-art methods while maintaining good run-time performance. This work provides a novel way to boost the performance of CNN methods without any re-training for VPR. The code is available at https://github.com/Lu-Feng/STA-VPR.
We address the problem of visual place recognition with perceptual changes. The fundamental problem of visual place recognition is generating robust image representations which are not only insensitive to environmental changes but also distinguishabl
This paper focuses on two key problems for audio-visual emotion recognition in the video. One is the audio and visual streams temporal alignment for feature level fusion. The other one is locating and re-weighting the perception attentions in the who
In this paper, we present a new tracking architecture with an encoder-decoder transformer as the key component. The encoder models the global spatio-temporal feature dependencies between target objects and search regions, while the decoder learns a q
Place recognition is indispensable for a drift-free localization system. Due to the variations of the environment, place recognition using single-modality has limitations. In this paper, we propose a bi-modal place recognition method, which can extra
We propose a methodology for robust, real-time place recognition using an imaging lidar, which yields image-quality high-resolution 3D point clouds. Utilizing the intensity readings of an imaging lidar, we project the point cloud and obtain an intens