Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Learning to Recognize Musical Genre from Audio

84 0 0.0 ( 0 )

Download Cite

Added by Micha\\\"el Defferrard

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Michael Defferrard - Sharada P. Mohanty - Sean F. Carroll

Sound Information Retrieval Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We here summarize our experience running a challenge with open data for musical genre recognition. Those notes motivate the task and the challenge design, show some statistics about the submissions, and present the results.

rate research

Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment

138 - Ruchit Agrawal , Simon Dixon 2020

Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features, which cannot be adapted to different acoustic conditions. We propose a method to overcome this limitation using learned frame similarity for audio-to-score alignment. We focus on offline audio-to-score alignment of piano music. Experiments on music data from different acoustic conditions demonstrate that our method achieves higher alignment accuracy than a standard DTW-based method that uses handcrafted features, and generates robust alignments whilst being adaptable to different domains at the same time.

Sound Information Retrieval Machine Learning

Towards an efficient deep learning model for musical onset detection

90 - Rong Gong , Xavier Serra 2018

In this paper, we propose an efficient and reproducible deep learning model for musical onset detection (MOD). We first review the state-of-the-art deep learning models for MOD, and identify their shortcomings and challenges: (i) the lack of hyper-parameter tuning details, (ii) the non-availability of code for training models on other datasets, and (iii) ignoring the network capability when comparing different architectures. Taking the above issues into account, we experiment with seven deep learning architectures. The most efficient one achieves equivalent performance to our implementation of the state-of-the-art architecture. However, it has only 28.3% of the total number of trainable parameters compared to the state-of-the-art. Our experiments are conducted using two different datasets: one mainly consists of instrumental music excerpts, and another developed by ourselves includes only solo singing voice excerpts. Further, inter-dataset transfer learning experiments are conducted. The results show that the model pre-trained on one dataset fails to detect onsets on another dataset, which denotes the importance of providing the implementation code to enable re-training the model for a different dataset. Datasets, code and a Jupyter notebook running on Google Colab are publicly available to make this research understandable and easy to reproduce.

Sound Information Retrieval Audio and Speech Processing

Transfer Learning of Artist Group Factors to Musical Genre Classification

82 - Jaehun Kim , Minz Won , Xavier Serra 2018

The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy. Artist labels are less subjective and less noisy, while certain artists may relate more strongly to certain genres. At the same time, at prediction time, it is not guaranteed that artist labels are available for a given audio segment. Therefore, in this work, we propose to apply the transfer learning framework, learning artist-related information which will be used at inference time for genre classification. We consider different types of artist-related information, expressed through artist group factors, which will allow for more efficient learning and stronger robustness to potential label noise. Furthermore, we investigate how to achieve the highest validation accuracy on the given FMA dataset, by experimenting with various kinds of transfer methods, including single-task transfer, multi-task transfer and finally multi-task learning.

Machine Learning Sound Audio and Speech Processing

Modeling Musical Context with Word2vec

101 - Dorien Herremans , Ching-Hua Chuan 2017

We present a semantic vector space model for capturing complex polyphonic musical context. A word2vec model based on a skip-gram representation with negative sampling was used to model slices of music from a dataset of Beethovens piano sonatas. A visualization of the reduced vector space using t-distributed stochastic neighbor embedding shows that the resulting embedded vector space captures tonal relationships, even without any explicit information about the musical contents of the slices. Secondly, an excerpt of the Moonlight Sonata from Beethoven was altered by replacing slices based on context similarity. The resulting music shows that the selected slice based on similar word2vec context also has a relatively short tonal distance from the original slice.

Sound Information Retrieval Multimedia

Learning Audio-Visual Dereverberation

360 - Changan Chen , Wei Sun , David Harwath 2021

Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition. Prior work attempts to remove reverberation based on the audio modality only. Our idea is to learn to dereverberate speech from audio-visual observations. The visual environment surrounding a human speaker reveals important cues about the room geometry, materials, and speaker location, all of which influence the precise reverberation effects in the audio stream. We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene. In support of this new task, we develop a large-scale dataset that uses realistic acoustic renderings of speech in real-world 3D scans of homes offering a variety of room acoustics. Demonstrating our approach on both simulated and real imagery for speech enhancement, speech recognition, and speaker identification, we show it achieves state-of-the-art performance and substantially improves over traditional audio-only methods. Project page: http://vision.cs.utexas.edu/projects/learning-audio-visual-dereverberation.

Sound Computer Vision and Pattern Recognition Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Learning to Recognize Musical Genre from Audio

Ask ChatGPT about the research

No Arabic abstract

We here summarize our experience running a challenge with open data for musical genre recognition. Those notes motivate the task and the challenge design, show some statistics about the submissions, and present the results.

Read More

suggested questions