From Note-Level to Chord-Level Neural Network Models for Voice Separation in Symbolic Music

80 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Razvan Bunescu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Patrick Gray - Razvan Bunescu

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Music is often experienced as a progression of concurrent streams of notes, or voices. The degree to which this happens depends on the position along a voice-leading continuum, ranging from monophonic, to homophonic, to polyphonic, which complicates the design of automatic voice separation models. We address this continuum by defining voice separation as the task of decomposing music into streams that exhibit both a high degree of external perceptual separation from the other streams and a high degree of internal perceptual consistency. The proposed voice separation task allows for a voice to diverge to multiple voices and also for multiple voices to converge to the same voice. Equipped with this flexible task definition, we manually annotated a corpus of popular music and used it to train neural networks that assign notes to voices either separately for each note in a chord (note-level), or jointly to all notes in a chord (chord-level). The trained neural models greedily assign notes to voices in a left to right traversal of the input chord sequence, using a diverse set of perceptually informed input features. When evaluated on the extraction of consecutive within voice note pairs, both models surpass a strong baseline based on an iterative application of an envelope extraction function, with the chord-level model consistently edging out the note-level model. The two models are also shown to outperform previous approaches on separating the voices in Bach music.

قيم البحث

179 - Kristen Masada , Razvan Bunescu 2018

We present a new approach to harmonic analysis that is trained to segment music into a sequence of chord spans tagged with chord labels. Formulated as a semi-Markov Conditional Random Field (semi-CRF), this joint segmentation and labeling approach en ables the use of a rich set of segment-level features, such as segment purity and chord coverage, that capture the extent to which the events in an entire segment of music are compatible with a candidate chord label. The new chord recognition model is evaluated extensively on three corpora of classical music and a newly created corpus of rock music. Experimental results show that the semi-CRF model performs substantially better than previous approaches when trained on a sufficient number of labeled examples and remains competitive when the amount of training data is limited.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

Performer Identification From Symbolic Representation of Music Using Statistical Models

163 - Syed Rifat Mahmud Rafee , Gyorgy Fazekas , Geraint A.~Wiggins 2021

Music Performers have their own idiosyncratic way of interpreting a musical piece. A group of skilled performers playing the same piece of music would likely to inject their unique artistic styles in their performances. The variations of the tempo, t iming, dynamics, articulation etc. from the actual notated music are what make the performers unique in their performances. This study presents a dataset consisting of four movements of Schuberts ``Sonata in B-flat major, D.960 performed by nine virtuoso pianists individually. We proposed and extracted a set of expressive features that are able to capture the characteristics of an individual performers style. We then present a performer identification method based on the similarity of feature distribution, given a set of piano performances. The identification is done considering each feature individually as well as a fusion of the features. Results show that the proposed method achieved a precision of 0.903 using fusion features. Moreover, the onset time deviation feature shows promising result when considered individually.

أنظمة الصوت في الحاسوب استرجاع المعلومات التعلم الآلي

Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding

422 - Zheng Sun , Jiaqi Liu , Zewang Zhang 2016

Creating aesthetically pleasing pieces of art, including music, has been a long-term goal for artificial intelligence research. Despite recent successes of long-short term memory (LSTM) recurrent neural networks (RNNs) in sequential learning, LSTM ne ural networks have not, by themselves, been able to generate natural-sounding music conforming to music theory. To transcend this inadequacy, we put forward a novel method for music composition that combines the LSTM with Grammars motivated by music theory. The main tenets of music theory are encoded as grammar argumented (GA) filters on the training data, such that the machine can be trained to generate music inheriting the naturalness of human-composed pieces from the original dataset while adhering to the rules of music theory. Unlike previous approaches, pitches and durations are encoded as one semantic entity, which we refer to as note-level encoding. This allows easy implementation of music theory grammars, as well as closer emulation of the thinking pattern of a musician. Although the GA rules are applied to the training data and never directly to the LSTM music generation, our machine still composes music that possess high incidences of diatonic scale notes, small pitch intervals and chords, in deference to music theory.

التعلم الآلي الذكاء الاصطناعي أنظمة الصوت في الحاسوب

Neural Network Alternatives to Convolutive Audio Models for Source Separation

78 - Shrikant Venkataramani , Y. Cem Subakan , Paris Smaragdis 2017

Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network alternative to co nvolutive NMF. Using the modeling flexibility granted by neural networks, we also explore the idea of using a Recurrent Neural Network in the encoder. Experimental results on speech mixtures from TIMIT dataset indicate that the convolutive architecture provides a significant improvement in separation performance in terms of BSSeval metrics.

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Multi-Level and Multi-Scale Feature Aggregation Using Sample-level Deep Convolutional Neural Networks for Music Classification

163 - Jongpil Lee , Juhan Nam 2017

Music tag words that describe music audio by text have different levels of abstraction. Taking this issue into account, we propose a music classification approach that aggregates multi-level and multi-scale features using pre-trained feature extracto rs. In particular, the feature extractors are trained in sample-level deep convolutional neural networks using raw waveforms. We show that this approach achieves state-of-the-art results on several music classification datasets.

أنظمة الصوت في الحاسوب التعلم الآلي الوسائط المتعددة