Deep Learning for Content-based Personalized Viewport Prediction of 360-Degree VR Videos

98 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xinwei Chen

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xinwei Chen - Ali Taleb Zadeh Kasgari - Walid Saad

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, the problem of head movement prediction for virtual reality videos is studied. In the considered model, a deep learning network is introduced to leverage position data as well as video frame content to predict future head movement. For optimizing data input into this neural network, data sample rate, reduced data, and long-period prediction length are also explored for this model. Simulation results show that the proposed approach yields 16.1% improvement in terms of prediction accuracy compared to a baseline approach that relies only on the position data.

قيم البحث

اقرأ أيضاً

Visual Distortions in 360-degree Videos

70 - Roberto G. de A. Azevedo , Neil Birkbeck , Francesca De Simone 2019

Omnidirectional (or 360-degree) images and videos are emergent signals in many areas such as robotics and virtual/augmented reality. In particular, for virtual reality, they allow an immersive experience in which the user is provided with a 360-degre e field of view and can navigate throughout a scene, e.g., through the use of Head Mounted Displays. Since it represents the full 360-degree field of view from one point of the scene, omnidirectional content is naturally represented as spherical visual signals. Current approaches for capturing, processing, delivering, and displaying 360-degree content, however, present many open technical challenges and introduce several types of distortions in these visual signals. Some of the distortions are specific to the nature of 360-degree images, and often different from those encountered in the classical image communication framework. This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals undergoing state of the art processing in common applications. While their impact on viewers visual perception and on the immersive experience at large is still unknown ---thus, it stays an open research topic--- this review serves the purpose of identifying the main causes of visual distortions in the end-to-end 360-degree content distribution pipeline. It is essential as a basis for benchmarking different processing techniques, allowing the effective design of new algorithms and applications. It is also necessary to the deployment of proper psychovisual studies to characterise the human perception of these new images in interactive and immersive applications.

الوسائط المتعددة

Deep Learning for Vision-based Prediction: A Survey

422 - Amir Rasouli 2020

Vision-based prediction algorithms have a wide range of applications including autonomous driving, surveillance, human-robot interaction, weather prediction. The objective of this paper is to provide an overview of the field in the past five years wi th a particular focus on deep learning approaches. For this purpose, we categorize these algorithms into video prediction, action prediction, trajectory prediction, body motion prediction, and other prediction applications. For each category, we highlight the common architectures, training methods and types of data used. In addition, we discuss the common evaluation metrics and datasets used for vision-based prediction tasks. A database of all the information presented in this survey including, cross-referenced according to papers, datasets and metrics, can be found online at https://github.com/aras62/vision-based-prediction.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

Learning-based Prediction and Uplink Retransmission for Wireless Virtual Reality (VR) Network

404 - Xiaonan Liu , Xinyu Li , Yansha Deng 2020

Wireless Virtual Reality (VR) users are able to enjoy immersive experience from anywhere at anytime. However, providing full spherical VR video with high quality under limited VR interaction latency is challenging. If the viewpoint of the VR user can be predicted in advance, only the required viewpoint is needed to be rendered and delivered, which can reduce the VR interaction latency. Therefore, in this paper, we use offline and online learning algorithms to predict viewpoint of the VR user using real VR dataset. For the offline learning algorithm, the trained learning model is directly used to predict the viewpoint of VR users in continuous time slots. While for the online learning algorithm, based on the VR users actual viewpoint delivered through uplink transmission, we compare it with the predicted viewpoint and update the parameters of the online learning algorithm to further improve the prediction accuracy. To guarantee the reliability of the uplink transmission, we integrate the Proactive retransmission scheme into our proposed online learning algorithm. Simulation results show that our proposed online learning algorithm for uplink wireless VR network with the proactive retransmission scheme only exhibits about 5% prediction error.

معالجة الإشارات التعلم الآلي

A Memory Network Approach for Story-based Temporal Summarization of 360{deg} Videos

87 - Sangho Lee , Jinyoung Sung , Youngjae Yu 2018

We address the problem of story-based temporal summarization of long 360{deg} videos. We propose a novel memory network model named Past-Future Memory Network (PFMN), in which we first compute the scores of 81 normal field of view (NFOV) region propo sals cropped from the input 360{deg} video, and then recover a latent, collective summary using the network with two external memories that store the embeddings of previously selected subshots and future candidate subshots. Our major contributions are two-fold. First, our work is the first to address story-based temporal summarization of 360{deg} videos. Second, our model is the first attempt to leverage memory networks for video summarization tasks. For evaluation, we perform three sets of experiments. First, we investigate the view selection capability of our model on the Pano2Vid dataset. Second, we evaluate the temporal summarization with a newly collected 360{deg} video dataset. Finally, we experiment our models performance in another domain, with image-based storytelling VIST dataset. We verify that our model achieves state-of-the-art performance on all the tasks.

الرؤية الحاسوبية وتمييز الأنماط

Interpreting Deep Learning based Cerebral Palsy Prediction with Channel Attention

127 - Manli Zhu , Qianhui Men , Edmond S. L. Ho 2021

Early prediction of cerebral palsy is essential as it leads to early treatment and monitoring. Deep learning has shown promising results in biomedical engineering thanks to its capacity of modelling complicated data with its non-linear architecture. However, due to their complex structure, deep learning models are generally not interpretable by humans, making it difficult for clinicians to rely on the findings. In this paper, we propose a channel attention module for deep learning models to predict cerebral palsy from infants body movements, which highlights the key features (i.e. body joints) the model identifies as important, thereby indicating why certain diagnostic results are found. To highlight the capacity of the deep network in modelling input features, we utilize raw joint positions instead of hand-crafted features. We validate our system with a real-world infant movement dataset. Our proposed channel attention module enables the visualization of the vital joints to this disease that the network considers. Our system achieves 91.67% accuracy, suppressing other state-of-the-art deep learning methods.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو