A novel music-based game with motion capture to support cognitive and motor function in the elderly

103 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dorien Herremans

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Kat Agres - Simon Lui - Dorien Herremans

تفاعل الإنسان والحاسوب الوسائط المتعددة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper presents a novel game prototype that uses music and motion detection as preventive medicine for the elderly. Given the aging populations around the globe, and the limited resources and staff able to care for these populations, eHealth solutions are becoming increasingly important, if not crucial, additions to modern healthcare and preventive medicine. Furthermore, because compliance rates for performing physical exercises are often quite low in the elderly, systems able to motivate and engage this population are a necessity. Our prototype uses music not only to engage listeners, but also to leverage the efficacy of music to improve mental and physical wellness. The game is based on a memory task to stimulate cognitive function, and requires users to perform physical gestures to mimic the playing of different musical instruments. To this end, the Microsoft Kinect sensor is used together with a newly developed gesture detection module in order to process users gestures. The resulting prototype system supports both cognitive functioning and physical strengthening in the elderly.

قيم البحث

118 - Yinglin Duan 2020

Music-to-dance translation is a brand-new and powerful feature in recent role-playing games. Players can now let their characters dance along with specified music clips and even generate fan-made dance videos. Previous works of this topic consider mu sic-to-dance as a supervised motion generation problem based on time-series data. However, these methods suffer from limited training data pairs and the degradation of movements. This paper provides a new perspective for this task where we re-formulate the translation problem as a piece-wise dance phrase retrieval problem based on the choreography theory. With such a design, players are allowed to further edit the dance movements on top of our generation while other regression based methods ignore such user interactivity. Considering that the dance motion capture is an expensive and time-consuming procedure which requires the assistance of professional dancers, we train our method under a semi-supervised learning framework with a large unlabeled dataset (20x than labeled data) collected. A co-ascent mechanism is introduced to improve the robustness of our network. Using this unlabeled dataset, we also introduce self-supervised pre-training so that the translator can understand the melody, rhythm, and other components of music phrases. We show that the pre-training significantly improves the translation accuracy than that of training from scratch. Experimental results suggest that our method not only generalizes well over various styles of music but also succeeds in expert-level choreography for game players.

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

Audio-Based Music Classification with DenseNet And Data Augmentation

129 - Wenhao Bian , Jie Wang , Bojin Zhuang 2019

In recent years, deep learning technique has received intense attention owing to its great success in image recognition. A tendency of adaption of deep learning in various information processing fields has formed, including music information retrieva l (MIR). In this paper, we conduct a comprehensive study on music audio classification with improved convolutional neural networks (CNNs). To the best of our knowledge, this the first work to apply Densely Connected Convolutional Networks (DenseNet) to music audio tagging, which has been demonstrated to perform better than Residual neural network (ResNet). Additionally, two specific data augmentation approaches of time overlapping and pitch shifting have been proposed to address the deficiency of labelled data in the MIR. Moreover, an ensemble learning of stacking is employed based on SVM. We believe that the proposed combination of strong representation of DenseNet and data augmentation can be adapted to other audio processing tasks.

معالجة الصوت والكلام الوسائط المتعددة أنظمة الصوت في الحاسوب

Emotion-Based End-to-End Matching Between Image and Music in Valence-Arousal Space

174 - Sicheng Zhao , Yaxian Li , Xingxu Yao 2020

Both images and music can convey rich semantics and are widely used to induce specific emotions. Matching images and music with similar emotions might help to make emotion perceptions more vivid and stronger. Existing emotion-based image and music ma tching methods either employ limited categorical emotion states which cannot well reflect the complexity and subtlety of emotions, or train the matching model using an impractical multi-stage pipeline. In this paper, we study end-to-end matching between image and music based on emotions in the continuous valence-arousal (VA) space. First, we construct a large-scale dataset, termed Image-Music-Emotion-Matching-Net (IMEMNet), with over 140K image-music pairs. Second, we propose cross-modal deep continuous metric learning (CDCML) to learn a shared latent embedding space which preserves the cross-modal similarity relationship in the continuous matching space. Finally, we refine the embedding space by further preserving the single-modal emotion relationship in the VA spaces of both images and music. The metric learning in the embedding space and task regression in the label space are jointly optimized for both cross-modal matching and single-modal VA prediction. The extensive experiments conducted on IMEMNet demonstrate the superiority of CDCML for emotion-based image and music matching as compared to the state-of-the-art approaches.

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة أنظمة الصوت في الحاسوب

MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers

70 - Yilun Zhao , Jia Guo 2020

Music annotation has always been one of the critical topics in the field of Music Information Retrieval (MIR). Traditional models use supervised learning for music annotation tasks. However, as supervised machine learning approaches increase in compl exity, the increasing need for more annotated training data can often not be matched with available data. In this paper, a new self-supervised music acoustic representation learning approach named MusiCoder is proposed. Inspired by the success of BERT, MusiCoder builds upon the architecture of self-attention bidirectional transformers. Two pre-training objectives, including Contiguous Frames Masking (CFM) and Contiguous Channels Masking (CCM), are designed to adapt BERT-like masked reconstruction pre-training to continuous acoustic frame domain. The performance of MusiCoder is evaluated in two downstream music annotation tasks. The results show that MusiCoder outperforms the state-of-the-art models in both music genre classification and auto-tagging tasks. The effectiveness of MusiCoder indicates a great potential of a new self-supervised learning approach to understand music: first apply masked reconstruction tasks to pre-train a transformer-based model with massive unlabeled music acoustic data, and then finetune the model on specific downstream tasks with labeled data.

معالجة الصوت والكلام الوسائط المتعددة

From Flow to Fuse: A Cognitive Perspective

46 - Kyros Jalife , Casper Harteveld , Christoffer Holmgard 2021

The concept of flow is used extensively in HCI, video games, and many other fields, but its prevalent definition is conceptually vague and alternative interpretations have contributed to ambiguity in the literature. To address this, we use cognitive science theory to expose inconsistencies in flows prevalent definition, and introduce fuse, a concept related to flow but consistent with cognitive science, and defined as the fusion of activity-related sensory stimuli and awareness. Based on this definition, we develop a preliminary model that hypothesizes fuses underlying cognitive processes. To illustrate the models practical value, we derive a set of design heuristics that we exemplify in the context of video games. Together, the fuse definition, model and design heuristics form our theoretical framework, and are a product of rethinking flow from a cognitive perspective with the purpose of improving conceptual clarity and theoretical robustness in the literature.

تفاعل الإنسان والحاسوب