مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Learning Local Feature Descriptor with Motion Attribute for Vision-based Localization

93 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yafei Song

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yafei Song - Di Zhu - Jia Li

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In recent years, camera-based localization has been widely used for robotic applications, and most proposed algorithms rely on local features extracted from recorded images. For better performance, the features used for open-loop localization are required to be short-term globally static, and the ones used for re-localization or loop closure detection need to be long-term static. Therefore, the motion attribute of a local feature point could be exploited to improve localization performance, e.g., the feature points extracted from moving persons or vehicles can be excluded from these systems due to their unsteadiness. In this paper, we design a fully convolutional network (FCN), named MD-Net, to perform motion attribute estimation and feature description simultaneously. MD-Net has a shared backbone network to extract features from the input image and two network branches to complete each sub-task. With MD-Net, we can obtain the motion attribute while avoiding increasing much more computation. Experimental results demonstrate that the proposed method can learn distinct local feature descriptor along with motion attribute only using an FCN, by outperforming competing methods by a wide margin. We also show that the proposed algorithm can be integrated into a vision-based localization algorithm to improve estimation accuracy significantly.

قيم البحث

223 - Hengli Wang , Peide Cai , Yuxiang Sun 2021

Recently, deep-learning based approaches have achieved impressive performance for autonomous driving. However, end-to-end vision-based methods typically have limited interpretability, making the behaviors of the deep networks difficult to explain. He nce, their potential applications could be limited in practice. To address this problem, we propose an interpretable end-to-end vision-based motion planning approach for autonomous driving, referred to as IVMP. Given a set of past surrounding-view images, our IVMP first predicts future egocentric semantic maps in birds-eye-view space, which are then employed to plan trajectories for self-driving vehicles. The predicted future semantic maps not only provide useful interpretable information, but also allow our motion planning module to handle objects with low probability, thus improving the safety of autonomous driving. Moreover, we also develop an optical flow distillation paradigm, which can effectively enhance the network while still maintaining its real-time performance. Extensive experiments on the nuScenes dataset and closed-loop simulation show that our IVMP significantly outperforms the state-of-the-art approaches in imitating human drivers with a much higher success rate. Our project page is available at https://sites.google.com/view/ivmp.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Attention-based Vehicle Self-Localization with HD Feature Maps

206 - Nico Engel , Vasileios Belagiannis , Klaus Dietmayer 2021

We present a vehicle self-localization method using point-based deep neural networks. Our approach processes measurements and point features, i.e. landmarks, from a high-definition digital map to infer the vehicles pose. To learn the best association and incorporate local information between the point sets, we propose an attention mechanism that matches the measurements to the corresponding landmarks. Finally, we use this representation for the point-cloud registration and the subsequent pose regression task. Furthermore, we introduce a training simulation framework that artificially generates measurements and landmarks to facilitate the deployment process and reduce the cost of creating extensive datasets from real-world data. We evaluate our method on our dataset, as well as an adapted version of the Kitti odometry dataset, where we achieve superior performance compared to related approaches; and additionally show dominant generalization capabilities.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Spatiotemporal Feature Learning for Event-Based Vision

135 - Rohan Ghosh , Anupam Gupta , Siyi Tang 2019

Unlike conventional frame-based sensors, event-based visual sensors output information through spikes at a high temporal resolution. By only encoding changes in pixel intensity, they showcase a low-power consuming, low-latency approach to visual info rmation sensing. To use this information for higher sensory tasks like object recognition and tracking, an essential simplification step is the extraction and learning of features. An ideal feature descriptor must be robust to changes involving (i) local transformations and (ii) re-appearances of a local event pattern. To that end, we propose a novel spatiotemporal feature representation learning algorithm based on slow feature analysis (SFA). Using SFA, smoothly changing linear projections are learnt which are robust to local visual transformations. In order to determine if the features can learn to be invariant to various visual transformations, feature point tracking tasks are used for evaluation. Extensive experiments across two datasets demonstrate the adaptability of the spatiotemporal feature learner to translation, scaling and rotational transformations of the feature points. More importantly, we find that the obtained feature representations are able to exploit the high temporal resolution of such event-based cameras in generating better feature tracks.

الرؤية الحاسوبية وتمييز الأنماط

SDGMNet: Statistic-based Dynamic Gradient Modulation for Local Descriptor Learning

164 - Jiayi Ma , Yuxin Deng 2021

Modifications on triplet loss that rescale the back-propagated gradients of special pairs have made significant progress on local descriptor learning. However, current gradient modulation strategies are mainly static so that they would suffer from ch anges of training phases or datasets. In this paper, we propose a dynamic gradient modulation, named SDGMNet, to improve triplet loss for local descriptor learning. The core of our method is formulating modulation functions with statistical characteristics which are estimated dynamically. Firstly, we perform deep analysis on back propagation of general triplet-based loss and introduce included angle for distance measure. On this basis, auto-focus modulation is employed to moderate the impact of statistically uncommon individual pairs in stochastic gradient descent optimization; probabilistic margin cuts off the gradients of proportional Siamese pairs that are believed to reach the optimum; power adjustment balances the total weights of negative pairs and positive pairs. Extensive experiments demonstrate that our novel descriptor surpasses previous state-of-the-arts on standard benchmarks including patch verification, matching and retrieval tasks.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Revisiting Local Descriptor based Image-to-Class Measure for Few-shot Learning

250 - Wenbin Li , Lei Wang , Jinglin Xu 2019

Few-shot learning in image classification aims to learn a classifier to classify images when only few training examples are available for each class. Recent work has achieved promising classification performance, where an image-level feature based me asure is usually used. In this paper, we argue that a measure at such a level may not be effective enough in light of the scarcity of examples in few-shot learning. Instead, we think a local descriptor based image-to-class measure should be taken, inspired by its surprising success in the heydays of local invariant features. Specifically, building upon the recent episodic training mechanism, we propose a Deep Nearest Neighbor Neural Network (DN4 in short) and train it in an end-to-end manner. Its key difference from the literature is the replacement of the image-level feature based measure in the final layer by a local descriptor based image-to-class measure. This measure is conducted online via a $k$-nearest neighbor search over the deep local descriptors of convolutional feature maps. The proposed DN4 not only learns the optimal deep local descriptors for the image-to-class measure, but also utilizes the higher efficiency of such a measure in the case of example scarcity, thanks to the exchangeability of visual patterns across the images in the same class. Our work leads to a simple, effective, and computationally efficient framework for few-shot learning. Experimental study on benchmark datasets consistently shows its superiority over the related state-of-the-art, with the largest absolute improvement of $17%$ over the next best. The source code can be available from UrlFont{https://github.com/WenbinLee/DN4.git}.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة أسيوط

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Learning Local Feature Descriptor with Motion Attribute for Vision-based Localization

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً