بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Creating an A Cappella Singing Audio Dataset for Automatic Jingju Singing Evaluation Research

114 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Rong Gong

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Rong Gong - Rafael Caro Repetto - Xavier Serra

أنظمة الصوت في الحاسوب المكتبات الرقمية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The data-driven computational research on automatic jingju (also known as Beijing or Peking opera) singing evaluation lacks a suitable and comprehensive a cappella singing audio dataset. In this work, we present an a cappella singing audio dataset which consists of 120 arias, accounting for 1265 melodic lines. This dataset is also an extension our existing CompMusic jingju corpus. Both professional and amateur singers were invited to the dataset recording sessions, and the most common jingju musical elements have been covered. This dataset is also accompanied by metadata per aria and melodic line annotated for automatic singing evaluation research purpose. All the gathered data is openly available online.

قيم البحث

اقرأ أيضاً

Learning Singing From Speech

92 - Liqiang Zhang , Chengzhu Yu , Heng Lu 2019

We propose an algorithm that is capable of synthesizing high quality target speakers singing voice given only their normal speech samples. The proposed algorithm first integrate speech and singing synthesis into a unified framework, and learns univer sal speaker embeddings that are shareable between speech and singing synthesis tasks. Specifically, the speaker embeddings learned from normal speech via the speech synthesis objective are shared with those learned from singing samples via the singing synthesis objective in the unified training framework. This makes the learned speaker embedding a transferable representation for both speaking and singing. We evaluate the proposed algorithm on singing voice conversion task where the content of original singing is covered with the timbre of another speakers voice learned purely from their normal speech samples. Our experiments indicate that the proposed algorithm generates high-quality singing voices that sound highly similar to target speakers voice given only his or her normal speech samples. We believe that proposed algorithm will open up new opportunities for singing synthesis and conversion for broader users and applications.

أنظمة الصوت في الحاسوب الحساب واللغة معالجة الصوت والكلام

NHSS: A Speech and Singing Parallel Database

201 - Bidisha Sharma , Xiaoxue Gao , Karthika Vijayan 2020

We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of Singapore (NUS), that is called NUS-HLT Speak-Sing (NHSS) database. We release this database to the public to support research activities, that include, but not limited to comparative studies of acoustic attributes of speech and singing signals, cooperative synthesis of speech and singing voices, and speech-to-singing conversion. This database consists of recordings of sung vocals of English pop songs, the spoken counterpart of lyrics of the songs read by the singers in their natural reading manner, and manually prepared utterance-level and word-level annotations. The audio recordings in the NHSS database correspond to 100 songs sung and spoken by 10 singers, resulting in a total of 7 hours of audio data. There are 5 male and 5 female singers, singing and reading the lyrics of 10 songs each. In this paper, we discuss the design methodology of the database, analyse the similarities and dissimilarities in characteristics of speech and singing voices, and provide some strategies to address relationships between these characteristics for converting one to another. We develop benchmark systems, which can be used as reference for speech-to-singing alignment, spectral mapping, and conversion using the NHSS database.

أنظمة الصوت في الحاسوب تفاعل الإنسان والحاسوب معالجة الصوت والكلام

Frequency-Temporal Attention Network for Singing Melody Extraction

68 - Shuai Yu , Xiaoheng Sun , Yi Yu 2021

Musical audio is generally composed of three physical properties: frequency, time and magnitude. Interestingly, human auditory periphery also provides neural codes for each of these dimensions to perceive music. Inspired by these intrinsic characteri stics, a frequency-temporal attention network is proposed to mimic human auditory for singing melody extraction. In particular, the proposed model contains frequency-temporal attention modules and a selective fusion module corresponding to these three physical properties. The frequency attention module is used to select the same activation frequency bands as did in cochlear and the temporal attention module is responsible for analyzing temporal patterns. Finally, the selective fusion module is suggested to recalibrate magnitudes and fuse the raw information for prediction. In addition, we propose to use another branch to simultaneously predict the presence of singing voice melody. The experimental results show that the proposed model outperforms existing state-of-the-art methods.

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Identification of potential Music Information Retrieval technologies for computer-aided jingju singing training

73 - Rong Gong , Xavier Serra 2017

Music Information Retrieval (MIR) technologies have been proven useful in assisting western classical singing training. Jingju (also known as Beijing or Peking opera) singing is different from western singing in terms of most of the perceptual dimens ions, and the trainees are taught by using mouth/heart method. In this paper, we first present the training method used in the professional jingju training classroom scenario and show the potential benefits of introducing the MIR technologies into the training process. The main part of this paper dedicates to identify the potential MIR technologies for jingju singing training. To this intent, we answer the question: how the jingju singing tutors and trainees value the importance of each jingju musical dimension-intonation, rhythm, loudness, tone quality and pronunciation? This is done by (i) classifying the classroom singing practices, tutors verbal feedbacks into these 5 dimensions, (ii) surveying the trainees. Then, with the help of the music signal analysis, a finer inspection on the classroom practice recording examples reveals the detailed elements in the training process. Finally, based on the above analysis, several potential MIR technologies are identified and would be useful for the jingju singing training.

استرجاع المعلومات أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Jointly Detecting and Separating Singing Voice: A Multi-Task Approach

261 - Daniel Stoller , Sebastian Ewert , Simon Dixon 2018

A main challenge in applying deep learning to music processing is the availability of training data. One potential solution is Multi-task Learning, in which the model also learns to solve related auxiliary tasks on additional datasets to exploit thei r correlation. While intuitive in principle, it can be challenging to identify related tasks and construct the model to optimally share information between tasks. In this paper, we explore vocal activity detection as an additional task to stabilise and improve the performance of vocal separation. Further, we identify problematic biases specific to each dataset that could limit the generalisation capability of separation and detection models, to which our proposed approach is robust. Experiments show improved performance in separation as well as vocal detection compared to single-task baselines. However, we find that the commonly used Signal-to-Distortion Ratio (SDR) metrics did not capture the improvement on non-vocal sections, indicating the need for improved evaluation methodologies.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشام الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Creating an A Cappella Singing Audio Dataset for Automatic Jingju Singing Evaluation Research

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً