Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Transfer Learning of Artist Group Factors to Musical Genre Classification

83 0 0.0 ( 0 )

Download Cite

Added by Jaehun Kim

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Jaehun Kim - Minz Won - Xavier Serra

Machine Learning Sound Audio and Speech Processing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy. Artist labels are less subjective and less noisy, while certain artists may relate more strongly to certain genres. At the same time, at prediction time, it is not guaranteed that artist labels are available for a given audio segment. Therefore, in this work, we propose to apply the transfer learning framework, learning artist-related information which will be used at inference time for genre classification. We consider different types of artist-related information, expressed through artist group factors, which will allow for more efficient learning and stronger robustness to potential label noise. Furthermore, we investigate how to achieve the highest validation accuracy on the given FMA dataset, by experimenting with various kinds of transfer methods, including single-task transfer, multi-task transfer and finally multi-task learning.

rate research

Learning to Recognize Musical Genre from Audio

83 - Michael Defferrard , Sharada P. Mohanty , Sean F. Carroll 2018

We here summarize our experience running a challenge with open data for musical genre recognition. Those notes motivate the task and the challenge design, show some statistics about the submissions, and present the results.

Sound Information Retrieval Machine Learning

Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders

85 - Yin-Jyun Luo , Kat Agres , Dorien Herremans 2019

In this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to learn distinct latent spaces for timbre and pitch, which form Gaussian mixture components representing instrument identity and pitch, respectively. For reconstruction, latent variables of timbre and pitch are sampled from corresponding mixture components, and are concatenated as the input to a decoder. We show the model efficacy by latent space visualization, and a quantitative analysis indicates the discriminability of these spaces, even with a limited number of instrument labels for training. The model allows for controllable synthesis of selected instrument sounds by sampling from the latent spaces. To evaluate this, we trained instrument and pitch classifiers using original labeled data. These classifiers achieve high accuracy when tested on our synthesized sounds, which verifies the model performance of controllable realistic timbre and pitch synthesis. Our model also enables timbre transfer between multiple instruments, with a single autoencoder architecture, which is evaluated by measuring the shift in posterior of instrument classification. Our in depth evaluation confirms the model ability to successfully disentangle timbre and pitch.

Machine Learning Sound Audio and Speech Processing

Snore-GANs: Improving Automatic Snore Sound Classification with Synthesized Data

113 - Zixing Zhang , Jing Han , Kun Qian 2019

One of the frontier issues that severely hamper the development of automatic snore sound classification (ASSC) associates to the lack of sufficient supervised training data. To cope with this problem, we propose a novel data augmentation approach based on semi-supervised conditional Generative Adversarial Networks (scGANs), which aims to automatically learn a mapping strategy from a random noise space to original data distribution. The proposed approach has the capability of well synthesizing realistic high-dimensional data, while requiring no additional annotation process. To handle the mode collapse problem of GANs, we further introduce an ensemble strategy to enhance the diversity of the generated data. The systematic experiments conducted on a widely used Munich-Passau snore sound corpus demonstrate that the scGANs-based systems can remarkably outperform other classic data augmentation systems, and are also competitive to other recently reported systems for ASSC.

Machine Learning Sound Audio and Speech Processing

Continual Learning of New Sound Classes using Generative Replay

310 - Zhepei Wang , Cem Subakan , Efthymios Tzinis 2019

Continual learning consists in incrementally training a model on a sequence of datasets and testing on the union of all datasets. In this paper, we examine continual learning for the problem of sound classification, in which we wish to refine already trained models to learn new sound classes. In practice one does not want to maintain all past training data and retrain from scratch, but naively updating a model with new data(sets) results in a degradation of already learned tasks, which is referred to as catastrophic forgetting. We develop a generative replay procedure for generating training audio spectrogram data, in place of keeping older training datasets. We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data. We thus conclude that we can extend a trained sound classifier to learn new classes without having to keep previously used datasets.

Machine Learning Sound Audio and Speech Processing

Topology of Musical Data

779 - Ryan Budney , William Sethares 2013

The musical realm is a promising area in which to expect to find nontrivial topological structures. This paper describes several kinds of metrics on musical data, and explores the implications of these metrics in two ways: via techniques of classical topology where the metric space of all-possible musical data can be described explicitly, and via modern data-driven ideas of persistent homology which calculates the Betti-number bar-codes of individual musical works. Both analyses are able to recover three well known topological structures in music: the circle of notes (octave-reduced scalar structures), the circle of fifths, and the rhythmic repetition of timelines. Applications to a variety of musical works (for example, folk music in the form of standard MIDI files) are presented, and the bar codes show many interesting features. Examples show that individual pieces may span the complete space (in which case the classical and the data-driven analyses agree), or they may span only part of the space.

Algebraic Topology Sound Statistics Theory

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Transfer Learning of Artist Group Factors to Musical Genre Classification

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions