No Arabic abstract
Mobile virtual reality (VR) head mounted displays (HMD) have become popular among consumers in recent years. In this work, we demonstrate real-time egocentric hand gesture detection and localization on mobile HMDs. Our main contributions are: 1) A novel mixed-reality data collection tool to automatic annotate bounding boxes and gesture labels; 2) The largest-to-date egocentric hand gesture and bounding box dataset with more than 400,000 annotated frames; 3) A neural network that runs real time on modern mobile CPUs, and achieves higher than 76% precision on gesture recognition across 8 classes.
Head gesture is a natural means of face-to-face communication between people but the recognition of head gestures in the context of virtual reality and use of head gesture as an interface for interacting with virtual avatars and virtual environments have been rarely investigated. In the current study, we present an approach for real-time head gesture recognition on head-mounted displays using Cascaded Hidden Markov Models. We conducted two experiments to evaluate our proposed approach. In experiment 1, we trained the Cascaded Hidden Markov Models and assessed the offline classification performance using collected head motion data. In experiment 2, we characterized the real-time performance of the approach by estimating the latency to recognize a head gesture with recorded real-time classification data. Our results show that the proposed approach is effective in recognizing head gestures. The method can be integrated into a virtual reality system as a head gesture interface for interacting with virtual worlds.
Purpose: Image guidance is crucial for the success of many interventions. Images are displayed on designated monitors that cannot be positioned optimally due to sterility and spatial constraints. This indirect visualization causes potential occlusion, hinders hand-eye coordination, leads to increased procedure duration and surgeon load. Methods: We propose a virtual monitor system that displays medical images in a mixed reality visualization using optical see-through head-mounted displays. The system streams high-resolution medical images from any modality to the head-mounted display in real-time that are blended with the surgical site. It allows for mixed reality visualization of images in head-, world-, or body-anchored mode and can thus be adapted to specific procedural needs. Results: For typical image sizes, the proposed system exhibits an average end-to-end delay and refresh rate of 214 +- 30 ms and 41:4 +- 32:0 Hz, respectively. Conclusions: The proposed virtual monitor system is capable of real-time mixed reality visualization of medical images. In future, we seek to conduct first pre-clinical studies to quantitatively assess the impact of the system on standard image guided procedures.
Augmented and virtual reality is being deployed in different fields of applications. Such applications might involve accessing or processing critical and sensitive information, which requires strict and continuous access control. Given that Head-Mounted Displays (HMD) developed for such applications commonly contains internal cameras for gaze tracking purposes, we evaluate the suitability of such setup for verifying the users through iris recognition. In this work, we first evaluate a set of iris recognition algorithms suitable for HMD devices by investigating three well-established handcrafted feature extraction approaches, and to complement it, we also present the analysis using four deep learning models. While taking into consideration the minimalistic hardware requirements of stand-alone HMD, we employ and adapt a recently developed miniature segmentation model (EyeMMS) for segmenting the iris. Further, to account for non-ideal and non-collaborative capture of iris, we define a new iris quality metric that we termed as Iris Mask Ratio (IMR) to quantify the iris recognition performance. Motivated by the performance of iris recognition, we also propose the continuous authentication of users in a non-collaborative capture setting in HMD. Through the experiments on a publicly available OpenEDS dataset, we show that performance with EER = 5% can be achieved using deep learning methods in a general setting, along with high accuracy for continuous user authentication.
Efficient motion intent communication is necessary for safe and collaborative work environments with collocated humans and robots. Humans efficiently communicate their motion intent to other humans through gestures, gaze, and social cues. However, robots often have difficulty efficiently communicating their motion intent to humans via these methods. Many existing methods for robot motion intent communication rely on 2D displays, which require the human to continually pause their work and check a visualization. We propose a mixed reality head-mounted display visualization of the proposed robot motion over the wearers real-world view of the robot and its environment. To evaluate the effectiveness of this system against a 2D display visualization and against no visualization, we asked 32 participants to labeled different robot arm motions as either colliding or non-colliding with blocks on a table. We found a 16% increase in accuracy with a 62% decrease in the time it took to complete the task compared to the next best system. This demonstrates that a mixed-reality HMD allows a human to more quickly and accurately tell where the robot is going to move than the compared baselines.
We present Steadiface, a new real-time face-centric video stabilization method that simultaneously removes hand shake and keeps subjects head stable. We use a CNN to estimate the face landmarks and use them to optimize a stabilized head center. We then formulate an optimization problem to find a virtual camera pose that locates the face to the stabilized head center while retains smooth rotation and translation transitions across frames. We test the proposed method on fieldtest videos and show it stabilizes both the head motion and background. It is robust to large head pose, occlusion, facial appearance variations, and different kinds of camera motions. We show our method advances the state of art in selfie video stabilization by comparing against alternative methods. The whole process runs very efficiently on a modern mobile phone (8.1 ms/frame).