Low-dimensional Embodied Semantics for Music and Language

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Francisco Afonso Raposo

تاريخ النشر 2019

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Francisco Afonso Raposo - David Martins de Matos - Ricardo Ribeiro

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Embodied cognition states that semantics is encoded in the brain as firing patterns of neural circuits, which are learned according to the statistical structure of human multimodal experience. However, each human brain is idiosyncratically biased, according to its subjective experience history, making this biological semantic machinery noisy with respect to the overall semantics inherent to media artifacts, such as music and language excerpts. We propose to represent shared semantics using low-dimensional vector embeddings by jointly modeling several brains from human subjects. We show these unsupervised efficient representations outperform the original high-dimensional fMRI voxel spaces in proxy music genre and language topic classification tasks. We further show that joint modeling of several subjects increases the semantic richness of the learned latent vector spaces.

قيم البحث

337 - Francisco Afonso Raposo , David Martins de Matos , Ricardo Ribeiro 2019

Music semantics is embodied, in the sense that meaning is biologically mediated by and grounded in the human body and brain. This embodied cognition perspective also explains why music structures modulate kinetic and somatosensory perception. We leve rage this aspect of cognition, by considering dance as a proxy for music perception, in a statistical computational model that learns semiotic correlations between music audio and dance video. We evaluate the ability of this model to effectively capture underlying semantics in a cross-modal retrieval task. Quantitative results, validated with statistical significance testing, strengthen the body of evidence for embodied cognition in music and show the model can recommend music audio for dance video queries and vice-versa.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي أنظمة الصوت في الحاسوب

Physics of the mind: Concepts, emotions, language, cognition, consciousness, beauty, music, and symbolic culture

291 - Leonid Perlovsky 2010

Mathematical approaches to modeling the mind since the 1950s are reviewed. Difficulties faced by these approaches are related to the fundamental incompleteness of logic discovered by K. Godel. A recent mathematical advancement, dynamic logic (DL) ove rcame these past difficulties. DL is described conceptually and related to neuroscience, psychology, cognitive science, and philosophy. DL models higher cognitive functions: concepts, emotions, instincts, understanding, imagination, intuition, consciousness. DL is related to the knowledge instinct that drives our understanding of the world and serves as a foundation for higher cognitive functions. Aesthetic emotions and perception of beauty are related to everyday functioning of the mind. The article reviews mechanisms of human symbolic ability, language and cognition, joint evolution of the mind, consciousness, and cultures. It touches on a manifold of aesthetic emotions in music, their cognitive function, origin, and evolution. The article concentrates on elucidating the first principles and reviews aspects of the theory proven in laboratory research.

الخلايا العصبية والإدراك

Score-informed Networks for Music Performance Assessment

278 - Jiawen Huang , Yun-Ning Hung , Ashis Pati 2020

The assessment of music performances in most cases takes into account the underlying musical score being performed. While there have been several automatic approaches for objective music performance assessment (MPA) based on extracted features from b oth the performance audio and the score, deep neural network-based methods incorporating score information into MPA models have not yet been investigated. In this paper, we introduce three different models capable of score-informed performance assessment. These are (i) a convolutional neural network that utilizes a simple time-series input comprising of aligned pitch contours and score, (ii) a joint embedding model which learns a joint latent space for pitch contours and scores, and (iii) a distance matrix-based convolutional neural network which utilizes patterns in the distance matrix between pitch contours and musical score to predict assessment ratings. Our results provide insights into the suitability of different architectures and input representations and demonstrate the benefits of score-informed models as compared to score-independent models.

معالجة الصوت والكلام استرجاع المعلومات التعلم الآلي

Multi-scale Embedded CNN for Music Tagging (MsE-CNN)

85 - Nima Hamidi , Mohsen Vahidzadeh , Stephen Baek 2019

Convolutional neural networks (CNN) recently gained notable attraction in a variety of machine learning tasks: including music classification and style tagging. In this work, we propose implementing intermediate connections to the CNN architecture to facilitate the transfer of multi-scale/level knowledge between different layers. Our novel model for music tagging shows significant improvement in comparison to the proposed approaches in the literature, due to its ability to carry low-level timbral features to the last layer.

أنظمة الصوت في الحاسوب استرجاع المعلومات التعلم الآلي

Metric Learning vs Classification for Disentangled Music Representation Learning

100 - Jongpil Lee , Nicholas J. Bryan , Justin Salamon 2020

Deep representation learning offers a powerful paradigm for mapping input data onto an organized embedding space and is useful for many music information retrieval tasks. Two central methods for representation learning include deep metric learning an d classification, both having the same goal of learning a representation that can generalize well across tasks. Along with generalization, the emerging concept of disentangled representations is also of great interest, where multiple semantic concepts (e.g., genre, mood, instrumentation) are learned jointly but remain separable in the learned representation space. In this paper we present a single representation learning framework that elucidates the relationship between metric learning, classification, and disentanglement in a holistic manner. For this, we (1) outline past work on the relationship between metric learning and classification, (2) extend this relationship to multi-label data by exploring three different learning approaches and their disentangl

أنظمة الصوت في الحاسوب استرجاع المعلومات التعلم الآلي