مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Towards an efficient deep learning model for musical onset detection

91 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Rong Gong

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Rong Gong - Xavier Serra

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we propose an efficient and reproducible deep learning model for musical onset detection (MOD). We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack of hyper-parameter tuning details, (ii) the non-availability of code for training models on other datasets, and (iii) ignoring the network capability when comparing different architectures. Taking the above issues into account, we experiment with seven deep learning architectures. The most efficient one achieves equivalent performance to our implementation of the state-of-the-art architecture. However, it has only 28.3% of the total number of trainable parameters compared to the state-of-the-art. Our experiments are conducted using two different datasets: one mainly consists of instrumental music excerpts, and another developed by ourselves includes only solo singing voice excerpts. Further, inter-dataset transfer learning experiments are conducted. The results show that the model pre-trained on one dataset fails to detect onsets on another dataset, which denotes the importance of providing the implementation code to enable re-training the model for a different dataset. Datasets, code and a Jupyter notebook running on Google Colab are publicly available to make this research understandable and easy to reproduce.

قيم البحث

178 - Jaesung Huh , Egil Martinsson , Adrian Kim 2020

Musical onset detection can be formulated as a time-to-event (TTE) or time-since-event (TSE) prediction task by defining music as a sequence of onset events. Here we propose a novel method to model the probability of onsets by introducing a sequentia l density prediction model. The proposed model estimates TTE & TSE distributions from mel-spectrograms using convolutional neural networks (CNNs) as a density predictor. We evaluate our model on the Bock dataset show-ing comparable results to previous deep-learning models.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

Learning to Recognize Musical Genre from Audio

83 - Michael Defferrard , Sharada P. Mohanty , Sean F. Carroll 2018

We here summarize our experience running a challenge with open data for musical genre recognition. Those notes motivate the task and the challenge design, show some statistics about the submissions, and present the results.

أنظمة الصوت في الحاسوب استرجاع المعلومات التعلم الآلي

Musical instrument sound classification with deep convolutional neural network using feature fusion approach

138 - Taejin Park , Taejin Lee 2015

A new musical instrument classification method using convolutional neural networks (CNNs) is presented in this paper. Unlike the traditional methods, we investigated a scheme for classifying musical instruments using the learned features from CNNs. T o create the learned features from CNNs, we not only used a conventional spectrogram image, but also proposed multiresolution recurrence plots (MRPs) that contain the phase information of a raw input signal. Consequently, we fed the characteristic timbre of the particular instrument into a neural network, which cannot be extracted using a phase-blinded representations such as a spectrogram. By combining our proposed MRPs and spectrogram images with a multi-column network, the performance of our proposed classifier system improves over a system that uses only a spectrogram. Furthermore, the proposed classifier also outperforms the baseline result from traditional handcrafted features and classifiers.

أنظمة الصوت في الحاسوب استرجاع المعلومات

Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions

75 - Rong Gong , Xavier Serra 2018

In this paper, we tackle the singing voice phoneme segmentation problem in the singing training scenario by using language-independent information -- onset and prior coarse duration. We propose a two-step method. In the first step, we jointly calcula te the syllable and phoneme onset detection functions (ODFs) using a convolutional neural network (CNN). In the second step, the syllable and phoneme boundaries and labels are inferred hierarchically by using a duration-informed hidden Markov model (HMM). To achieve the inference, we incorporate the a priori duration model as the transition probabilities and the ODFs as the emission probabilities into the HMM. The proposed method is designed in a language-independent way such that no phoneme class labels are used. For the model training and algorithm evaluation, we collect a new jingju (also known as Beijing or Peking opera) solo singing voice dataset and manually annotate the boundaries and labels at phrase, syllable and phoneme levels. The dataset is publicly available. The proposed method is compared with a baseline method based on hidden semi-Markov model (HSMM) forced alignment. The evaluation results show that the proposed method outperforms the baseline by a large margin regarding both segmentation and onset detection tasks.

أنظمة الصوت في الحاسوب استرجاع المعلومات معالجة الصوت والكلام

Modeling Musical Context with Word2vec

101 - Dorien Herremans , Ching-Hua Chuan 2017

We present a semantic vector space model for capturing complex polyphonic musical context. A word2vec model based on a skip-gram representation with negative sampling was used to model slices of music from a dataset of Beethovens piano sonatas. A vis ualization of the reduced vector space using t-distributed stochastic neighbor embedding shows that the resulting embedded vector space captures tonal relationships, even without any explicit information about the musical contents of the slices. Secondly, an excerpt of the Moonlight Sonata from Beethoven was altered by replacing slices based on context similarity. The resulting music shows that the selected slice based on similar word2vec context also has a relatively short tonal distance from the original slice.

أنظمة الصوت في الحاسوب استرجاع المعلومات الوسائط المتعددة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الدولية الخاصة للعلوم والتكنولوجيا

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Towards an efficient deep learning model for musical onset detection

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً