No Arabic abstract
Designing of touchless user interface is gaining popularity in various contexts. Using such interfaces, users can interact with electronic devices even when the hands are dirty or non-conductive. Also, user with partial physical disability can interact with electronic devices using such systems. Research in this direction has got major boost because of the emergence of low-cost sensors such as Leap Motion, Kinect or RealSense devices. In this paper, we propose a Leap Motion controller-based methodology to facilitate rendering of 2D and 3D shapes on display devices. The proposed method tracks finger movements while users perform natural gestures within the field of view of the sensor. In the next phase, trajectories are analyzed to extract extended Npen++ features in 3D. These features represent finger movements during the gestures and they are fed to unidirectional left-to-right Hidden Markov Model (HMM) for training. A one-to-one mapping between gestures and shapes is proposed. Finally, shapes corresponding to these gestures are rendered over the display using MuPad interface. We have created a dataset of 5400 samples recorded by 10 volunteers. Our dataset contains 18 geometric and 18 non-geometric shapes such as circle, rectangle, flower, cone, sphere etc. The proposed methodology achieves an accuracy of 92.87% when evaluated using 5-fold cross validation method. Our experiments revel that the extended 3D features perform better than existing 3D features in the context of shape representation and classification. The method can be used for developing useful HCI applications for smart display devices.
Hand gesture is a new and promising interface for locomotion in virtual environments. While several previous studies have proposed different hand gestures for virtual locomotion, little is known about their differences in terms of performance and user preference in virtual locomotion tasks. In the present paper, we presented three different hand gesture interfaces and their algorithms for locomotion, which are called the Finger Distance gesture, the Finger Number gesture and the Finger Tapping gesture. These gestures were inspired by previous studies of gesture-based locomotion interfaces and are typical gestures that people are familiar with in their daily lives. Implementing these hand gesture interfaces in the present study enabled us to systematically compare the differences between these gestures. In addition, to compare the usability of these gestures to locomotion interfaces using gamepads, we also designed and implemented a gamepad interface based on the Xbox One controller. We compared these four interfaces through two virtual locomotion tasks. These tasks assessed their performance and user preference on speed control and waypoints navigation. Results showed that user preference and performance of the Finger Distance gesture were comparable to that of the gamepad interface. The Finger Number gesture also had close performance and user preference to that of the Finger Distance gesture. Our study demonstrates that the Finger Distance gesture and the Finger Number gesture are very promising interfaces for virtual locomotion. We also discuss that the Finger Tapping gesture needs further improvements before it can be used for virtual walking.
The spatially-varying field of the human visual system has recently received a resurgence of interest with the development of virtual reality (VR) and neural networks. The computational demands of high resolution rendering desired for VR can be offset by savings in the periphery, while neural networks trained with foveated input have shown perceptual gains in i.i.d and o.o.d generalization. In this paper, we present a technique that exploits the CUDA GPU architecture to efficiently generate Gaussian-based foveated images at high definition (1920x1080 px) in real-time (165 Hz), with a larger number of pooling regions than previous Gaussian-based foveation algorithms by several orders of magnitude, producing a smoothly foveated image that requires no further blending or stitching, and that can be well fit for any contrast sensitivity function. The approach described can be adapted from Gaussian blurring to any eccentricity-dependent image processing and our algorithm can meet demand for experimentation to evaluate the role of spatially-varying processing across biological and artificial agents, so that foveation can be added easily on top of existing systems rather than forcing their redesign (emulated foveated renderer). Altogether, this paper demonstrates how a GPU, with a CUDA block-wise architecture, can be employed for radially-variant rendering, with opportunities for more complex post-processing to ensure a metameric foveation scheme. Code is provided.
Recent research has proposed teleoperation of robotic and aerial vehicles using head motion tracked by a head-mounted display (HMD). First-person views of the vehicles are usually captured by onboard cameras and presented to users through the display panels of HMDs. This provides users with a direct, immersive and intuitive interface for viewing and control. However, a typically overlooked factor in such designs is the latency introduced by the vehicle dynamics. As head motion is coupled with visual updates in such applications, visual and control latency always exists between the issue of control commands by head movements and the visual feedback received at the completion of the attitude adjustment. This causes a discrepancy between the intended motion, the vestibular cue and the visual cue and may potentially result in simulator sickness. No research has been conducted on how various levels of visual and control latency introduced by dynamics in robots or aerial vehicles affect users performance and the degree of simulator sickness elicited. Thus, it is uncertain how much performance is degraded by latency and whether such designs are comfortable from the perspective of users. To address these issues, we studied a prototyped scenario of a head motion controlled quadcopter using an HMD. We present a virtual reality (VR) paradigm to systematically assess the effects of visual and control latency in simulated drone control scenarios.
Recognizing people by faces and other biometrics has been extensively studied in computer vision. But these techniques do not work for identifying the wearer of an egocentric (first-person) camera because that person rarely (if ever) appears in their own first-person view. But while ones own face is not frequently visible, their hands are: in fact, hands are among the most common objects in ones own field of view. It is thus natural to ask whether the appearance and motion patterns of peoples hands are distinctive enough to recognize them. In this paper, we systematically study the possibility of Egocentric Hand Identification (EHI) with unconstrained egocentric hand gestures. We explore several different visual cues, including color, shape, skin texture, and depth maps to identify users hands. Extensive ablation experiments are conducted to analyze the properties of hands that are most distinctive. Finally, we show that EHI can improve generalization of other tasks, such as gesture recognition, by training adversarially to encourage these models to ignore differences between users.
Accurate hand pose estimation at joint level has several uses on human-robot interaction, user interfacing and virtual reality applications. Yet, it currently is not a solved problem. The novel deep learning techniques could make a great improvement on this matter but they need a huge amount of annotated data. The hand pose datasets released so far present some issues that make them impossible to use on deep learning methods such as the few number of samples, high-level abstraction annotations or samples consisting in depth maps. In this work, we introduce a multiview hand pose dataset in which we provide color images of hands and different kind of annotations for each, i.e the bounding box and the 2D and 3D location on the joints in the hand. Besides, we introduce a simple yet accurate deep learning architecture for real-time robust 2D hand pose estimation.