Transfer Learning of Artist Group Factors to Musical Genre Classification

83 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jaehun Kim

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jaehun Kim - Minz Won - Xavier Serra

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy. Artist labels are less subjective and less noisy, while certain artists may relate more strongly to certain genres. At the same time, at prediction time, it is not guaranteed that artist labels are available for a given audio segment. Therefore, in this work, we propose to apply the transfer learning framework, learning artist-related information which will be used at inference time for genre classification. We consider different types of artist-related information, expressed through artist group factors, which will allow for more efficient learning and stronger robustness to potential label noise. Furthermore, we investigate how to achieve the highest validation accuracy on the given FMA dataset, by experimenting with various kinds of transfer methods, including single-task transfer, multi-task transfer and finally multi-task learning.

قيم البحث

83 - Michael Defferrard , Sharada P. Mohanty , Sean F. Carroll 2018

We here summarize our experience running a challenge with open data for musical genre recognition. Those notes motivate the task and the challenge design, show some statistics about the submissions, and present the results.

أنظمة الصوت في الحاسوب استرجاع المعلومات التعلم الآلي

Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders

85 - Yin-Jyun Luo , Kat Agres , Dorien Herremans 2019

In this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to l earn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively. For reconstruction, latent variables of timbre and pitch are sampled from corresponding mixture components, and are concatenated as the input to a decoder. We show the model efficacy by latent space visualization, and a quantitative analysis indicates the discriminability of these spaces, even with a limited number of instrument labels for training. The model allows for controllable synthesis of selected instrument sounds by sampling from the latent spaces. To evaluate this, we trained instrument and pitch classifiers using original labeled data. These classifiers achieve high accuracy when tested on our synthesized sounds, which verifies the model performance of controllable realistic timbre and pitch synthesis. Our model also enables timbre transfer between multiple instruments, with a single autoencoder architecture, which is evaluated by measuring the shift in posterior of instrument classification. Our in depth evaluation confirms the model ability to successfully disentangle timbre and pitch.

التعلم الآلي أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Snore-GANs: Improving Automatic Snore Sound Classification with Synthesized Data

113 - Zixing Zhang , Jing Han , Kun Qian 2019

One of the frontier issues that severely hamper the development of automatic snore sound classification (ASSC) associates to the lack of sufficient supervised training data. To cope with this problem, we propose a novel data augmentation approach bas ed on semi-supervised conditional Generative Adversarial Networks (scGANs), which aims to automatically learn a mapping strategy from a random noise space to original data distribution. The proposed approach has the capability of well synthesizing realistic high-dimensional data, while requiring no additional annotation process. To handle the mode collapse problem of GANs, we further introduce an ensemble strategy to enhance the diversity of the generated data. The systematic experiments conducted on a widely used Munich-Passau snore sound corpus demonstrate that the scGANs-based systems can remarkably outperform other classic data augmentation systems, and are also competitive to other recently reported systems for ASSC.

التعلم الآلي أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Continual Learning of New Sound Classes using Generative Replay

310 - Zhepei Wang , Cem Subakan , Efthymios Tzinis 2019

Continual learning consists in incrementally training a model on a sequence of datasets and testing on the union of all datasets. In this paper, we examine continual learning for the problem of sound classification, in which we wish to refine already trained models to learn new sound classes. In practice one does not want to maintain all past training data and retrain from scratch, but naively updating a model with new data(sets) results in a degradation of already learned tasks, which is referred to as catastrophic forgetting. We develop a generative replay procedure for generating training audio spectrogram data, in place of keeping older training datasets. We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data. We thus conclude that we can extend a trained sound classifier to learn new classes without having to keep previously used datasets.

التعلم الآلي أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Topology of Musical Data

340 - Ryan Budney , William Sethares 2013

The musical realm is a promising area in which to expect to find nontrivial topological structures. This paper describes several kinds of metrics on musical data, and explores the implications of these metrics in two ways: via techniques of classical topology where the metric space of all-possible musical data can be described explicitly, and via modern data-driven ideas of persistent homology which calculates the Betti-number bar-codes of individual musical works. Both analyses are able to recover three well known topological structures in music: the circle of notes (octave-reduced scalar structures), the circle of fifths, and the rhythmic repetition of timelines. Applications to a variety of musical works (for example, folk music in the form of standard MIDI files) are presented, and the bar codes show many interesting features. Examples show that individual pieces may span the complete space (in which case the classical and the data-driven analyses agree), or they may span only part of the space.

الطوبولوجيا الجبرية أنظمة الصوت في الحاسوب نظرية الإحصاء