Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation

96 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Lin Wang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lin Wang - Yujeong Chae - Kuk-Jin Yoon

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Event cameras are novel sensors that perceive the per-pixel intensity changes and output asynchronous event streams with high dynamic range and less motion blur. It has been shown that events alone can be used for end-task learning, eg, semantic segmentation, based on encoder-decoder-like networks. However, as events are sparse and mostly reflect edge information, it is difficult to recover original details merely relying on the decoder. Moreover, most methods resort to pixel-wise loss alone for supervision, which might be insufficient to fully exploit the visual details from sparse events, thus leading to less optimal performance. In this paper, we propose a simple yet flexible two-stream framework named Dual Transfer Learning (DTL) to effectively enhance the performance on the end-tasks without adding extra inference cost. The proposed approach consists of three parts: event to end-task learning (EEL) branch, event to image translation (EIT) branch, and transfer learning (TL) module that simultaneously explores the feature-level affinity information and pixel-level knowledge from the EIT branch to improve the EEL branch. This simple yet novel method leads to strong representation learning from events and is evidenced by the significant performance boost on the end-tasks such as semantic segmentation and depth estimation.

قيم البحث

246 - Haosheng Chen , David Suter , Qiangqiang Wu 2020

Event cameras, which are asynchronous bio-inspired vision sensors, have shown great potential in computer vision and artificial intelligence. However, the application of event cameras to object-level motion estimation or tracking is still in its infa ncy. The main idea behind this work is to propose a novel deep neural network to learn and regress a parametric object-level motion/transform model for event-based object tracking. To achieve this goal, we propose a synchronous Time-Surface with Linear Time Decay (TSLTD) representation, which effectively encodes the spatio-temporal information of asynchronous retinal events into TSLTD frames with clear motion patterns. We feed the sequence of TSLTD frames to a novel Retinal Motion Regression Network (RMRNet) to perform an end-to-end 5-DoF object motion regression. Our method is compared with state-of-the-art object tracking methods, that are based on conventional cameras or event cameras. The experimental results show the superiority of our method in handling various challenging environments such as fast motion and low illumination conditions.

الرؤية الحاسوبية وتمييز الأنماط

Multi-mapping Image-to-Image Translation via Learning Disentanglement

136 - Xiaoming Yu , Yuanqi Chen , Thomas Li 2019

Recent advances of image-to-image translation focus on learning the one-to-many mapping from two aspects: multi-modal translation and multi-domain translation. However, the existing methods only consider one of the two perspectives, which makes them unable to solve each others problem. To address this issue, we propose a novel unified model, which bridges these two objectives. First, we disentangle the input images into the latent representations by an encoder-decoder architecture with a conditional adversarial training in the feature space. Then, we encourage the generator to learn multi-mappings by a random cross-domain translation. As a result, we can manipulate different parts of the latent representations to perform multi-modal and multi-domain translations simultaneously. Experiments demonstrate that our method outperforms state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط

Event-LSTM: An Unsupervised and Asynchronous Learning-based Representation for Event-based Data

239 - Lakshmi Annamalai , Vignesh Ramanathan , Chetan Singh Thakur 2021

Event cameras are activity-driven bio-inspired vision sensors, thereby resulting in advantages such as sparsity,high temporal resolution, low latency, and power consumption. Given the different sensing modality of event camera and high quality of con ventional vision paradigm, event processing is predominantly solved by transforming the sparse and asynchronous events into 2D grid and subsequently applying standard vision pipelines. Despite the promising results displayed by supervised learning approaches in 2D grid generation, these approaches treat the task in supervised manner. Labeled task specific ground truth event data is challenging to acquire. To overcome this limitation, we propose Event-LSTM, an unsupervised Auto-Encoder architecture made up of LSTM layers as a promising alternative to learn 2D grid representation from event sequence. Compared to competing supervised approaches, ours is a task-agnostic approach ideally suited for the event domain, where task specific labeled data is scarce. We also tailor the proposed solution to exploit asynchronous nature of event stream, which gives it desirable charateristics such as speed invariant and energy-efficient 2D grid generation. Besides, we also push state-of-the-art event de-noising forward by introducing memory into the de-noising process. Evaluations on activity recognition and gesture recognition demonstrate that our approach yields improvement over state-of-the-art approaches, while providing the flexibilty to learn from unlabelled data.

الرؤية الحاسوبية وتمييز الأنماط

Simulating time to event prediction with spatiotemporal echocardiography deep learning

76 - Rohan Shad , Nicolas Quach , Robyn Fong 2021

Integrating methods for time-to-event prediction with diagnostic imaging modalities is of considerable interest, as accurate estimates of survival requires accounting for censoring of individuals within the observation period. New methods for time-to -event prediction have been developed by extending the cox-proportional hazards model with neural networks. In this paper, to explore the feasibility of these methods when applied to deep learning with echocardiography videos, we utilize the Stanford EchoNet-Dynamic dataset with over 10,000 echocardiograms, and generate simulated survival datasets based on the expert annotated ejection fraction readings. By training on just the simulated survival outcomes, we show that spatiotemporal convolutional neural networks yield accurate survival estimates.

الرؤية الحاسوبية وتمييز الأنماط

Spatiotemporal Feature Learning for Event-Based Vision

135 - Rohan Ghosh , Anupam Gupta , Siyi Tang 2019

Unlike conventional frame-based sensors, event-based visual sensors output information through spikes at a high temporal resolution. By only encoding changes in pixel intensity, they showcase a low-power consuming, low-latency approach to visual info rmation sensing. To use this information for higher sensory tasks like object recognition and tracking, an essential simplification step is the extraction and learning of features. An ideal feature descriptor must be robust to changes involving (i) local transformations and (ii) re-appearances of a local event pattern. To that end, we propose a novel spatiotemporal feature representation learning algorithm based on slow feature analysis (SFA). Using SFA, smoothly changing linear projections are learnt which are robust to local visual transformations. In order to determine if the features can learn to be invariant to various visual transformations, feature point tracking tasks are used for evaluation. Extensive experiments across two datasets demonstrate the adaptability of the spatiotemporal feature learner to translation, scaling and rotational transformations of the feature points. More importantly, we find that the obtained feature representations are able to exploit the high temporal resolution of such event-based cameras in generating better feature tracks.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشھباء الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Dual Transfer Learning for Event-based End-task Prediction via Pluggable Event to Image Translation

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً