No Arabic abstract
Reconstructing 3D models from large, dense point clouds is critical to enable Virtual Reality (VR) as a platform for entertainment, education, and heritage preservation. Existing 3D reconstruction systems inevitably make trade-offs between three conflicting goals: the efficiency of reconstruction (e.g., time and memory requirements), the visual quality of the constructed scene, and the rendering speed on the VR device. This paper proposes a reconstruction system that simultaneously meets all three goals. The key idea is to avoid the resource-demanding process of reconstructing a high-polygon mesh altogether. Instead, we propose to directly transfer details from the original point cloud to a low polygon mesh, which significantly reduces the reconstruction time and cost, preserves the scene details, and enables real-time rendering on mobile VR devices. While our technique is general, we demonstrate it in reconstructing cultural heritage sites. We for the first time digitally reconstruct the Elmina Castle, a UNESCO world heritage site at Ghana, from billions of laser-scanned points. The reconstruction process executes on low-end desktop systems without requiring high processing power, making it accessible to the broad community. The reconstructed scenes render on Oculus Go in 60 FPS, providing a real-time VR experience with high visual quality. Our project is part of the Digital Elmina effort (http://digitalelmina.org/) between University of Rochester and University of Ghana.
The recent rise of interest in Virtual Reality (VR) came with the availability of commodity commercial VR prod- ucts, such as the Head Mounted Displays (HMD) created by Oculus and other vendors. To accelerate the user adoption of VR headsets, content providers should focus on producing high quality immersive content for these devices. Similarly, multimedia streaming service providers should enable the means to stream 360 VR content on their platforms. In this study, we try to cover different aspects related to VR content representation, streaming, and quality assessment that will help establishing the basic knowledge of how to build a VR streaming system.
The panoramic video is widely used to build virtual reality (VR) and is expected to be one of the next generation Killer-Apps. Transmitting panoramic VR videos is a challenging task because of two problems: 1) panoramic VR videos are typically much larger than normal videos but they need to be transmitted with limited bandwidth in mobile networks. 2) high-resolution and fluent views should be provided to guarantee a superior user experience and avoid side-effects such as dizziness and nausea. To address these two problems, we propose a novel interactive streaming technology, namely Focus-based Interactive Streaming Framework (FISF). FISF consists of three parts: 1) we use the classic clustering algorithm DBSCAN to analyze real user data for Video Focus Detection (VFD); 2) we propose a Focus-based Interactive Streaming Technology (FIST), including a static version and a dynamic version; 3) we propose two optimization methods: focus merging and prefetch strategy. Experimental results show that FISF significantly outperforms the state-of-the-art. The paper is submitted to Sigcomm 2017, VR/AR Network on 31 Mar 2017 at 10:44:04am EDT.
There is a need for remote learning and virtual learning applications such as virtual reality (VR) and tablet-based solutions which the current pandemic has demonstrated. Creating complex learning scenarios by developers is highly time-consuming and can take over a year. There is a need to provide a simple method to enable lecturers to create their own content for their laboratory tutorials. Research is currently being undertaken into developing generic models to enable the semi-automatic creation of a virtual learning application. A case study describing the creation of a virtual learning application for an electrical laboratory tutorial is presented.
Virtual reality (VR) over wireless is emerging as an important use case of 5G networks. Immersive VR experience requires the delivery of huge data at ultra-low latency, thus demanding ultra-high transmission rate. This challenge can be largely addressed by the recent network architecture known as mobile edge computing (MEC), which enables caching and computing capabilities at the edge of wireless networks. This paper presents a novel MEC-based mobile VR delivery framework that is able to cache parts of the field of views (FOVs) in advance and run certain post-processing procedures at the mobile VR device. To optimize resource allocation at the mobile VR device, we formulate a joint caching and computing decision problem to minimize the average required transmission rate while meeting a given latency constraint. When FOVs are homogeneous, we obtain a closed-form expression for the optimal joint policy which reveals interesting communications-caching-computing tradeoffs. When FOVs are heterogeneous, we obtain a local optima of the problem by transforming it into a linearly constrained indefinite quadratic problem then applying concave convex procedure. Numerical results demonstrate great promises of the proposed mobile VR delivery framework in saving communication bandwidth while meeting low latency requirement.
Social presence, the feeling of being there with a real person, will fuel the next generation of communication systems driven by digital humans in virtual reality (VR). The best 3D video-realistic VR avatars that minimize the uncanny effect rely on person-specific (PS) models. However, these PS models are time-consuming to build and are typically trained with limited data variability, which results in poor generalization and robustness. Major sources of variability that affects the accuracy of facial expression transfer algorithms include using different VR headsets (e.g., camera configuration, slop of the headset), facial appearance changes over time (e.g., beard, make-up), and environmental factors (e.g., lighting, backgrounds). This is a major drawback for the scalability of these models in VR. This paper makes progress in overcoming these limitations by proposing an end-to-end multi-identity architecture (MIA) trained with specialized augmentation strategies. MIA drives the shape component of the avatar from three cameras in the VR headset (two eyes, one mouth), in untrained subjects, using minimal personalized information (i.e., neutral 3D mesh shape). Similarly, if the PS texture decoder is available, MIA is able to drive the full avatar (shape+texture) robustly outperforming PS models in challenging scenarios. Our key contribution to improve robustness and generalization, is that our method implicitly decouples, in an unsupervised manner, the facial expression from nuisance factors (e.g., headset, environment, facial appearance). We demonstrate the superior performance and robustness of the proposed method versus state-of-the-art PS approaches in a variety of experiments.