Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment

139 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ruchit Agrawal

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ruchit Agrawal - Simon Dixon

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features, which cannot be adapted to different acoustic conditions. We propose a method to overcome this limitation using learned frame similarity for audio-to-score alignment. We focus on offline audio-to-score alignment of piano music. Experiments on music data from different acoustic conditions demonstrate that our method achieves higher alignment accuracy than a standard DTW-based method that uses handcrafted features, and generates robust alignments whilst being adaptable to different domains at the same time.

قيم البحث

314 - Pranay Manocha , Rohan Badlani , Anurag Kumar 2017

In this paper, we focus on the problem of content-based retrieval for audio, which aims to retrieve all semantically similar audio recordings for a given audio clip query. This problem is similar to the problem of query by example of audio, which aim s to retrieve media samples from a database, which are similar to the user-provided example. We propose a novel approach which encodes the audio into a vector representation using Siamese Neural Networks. The goal is to obtain an encoding similar for files belonging to the same audio class, thus allowing retrieval of semantically similar audio. Using simple similarity measures such as those based on simple euclidean distance and cosine similarity we show that these representations can be very effectively used for retrieving recordings similar in audio content.

أنظمة الصوت في الحاسوب استرجاع المعلومات معالجة الصوت والكلام

Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks

190 - Ruchit Agrawal , Daniel Wolff , Simon Dixon 2021

The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment, an important subtask of music information retrieval. We present a novel method to detect such diffe rences between the score and performance for a given piece of music using progressively dilated convolutional neural networks. Our method incorporates varying dilation rates at different layers to capture both short-term and long-term context, and can be employed successfully in the presence of limited annotated data. We conduct experiments on audio recordings of real performances that differ structurally from the score, and our results demonstrate that our models outperform standard methods for structure-aware audio-to-score alignment.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

Learning to Recognize Musical Genre from Audio

83 - Michael Defferrard , Sharada P. Mohanty , Sean F. Carroll 2018

We here summarize our experience running a challenge with open data for musical genre recognition. Those notes motivate the task and the challenge design, show some statistics about the submissions, and present the results.

أنظمة الصوت في الحاسوب استرجاع المعلومات التعلم الآلي

A Hybrid Approach to Audio-to-Score Alignment

119 - Ruchit Agrawal , Simon Dixon 2020

Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features. We explore the usage of ne ural networks as a preprocessing step for DTW-based automatic alignment methods. Experiments on music data from different acoustic conditions demonstrate that this method generates robust alignments whilst being adaptable at the same time.

معالجة الصوت والكلام التعلم الآلي أنظمة الصوت في الحاسوب

End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model

224 - Daniel Stoller , Simon Durand , Sebastian Ewert 2019

Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training required fine-grained annotations to be available in some form. Here, we present a novel system based on a modified Wave-U-Net architecture, which predicts character probabilities directly from raw audio using learnt multi-scale representations of the various signal components. There are no sub-modules whose interdependencies need to be optimized. Our training procedure is designed to work with weak, line-level annotations available in the real world. With a mean alignment error of 0.35s on a standard dataset our system outperforms the state-of-the-art by an order of magnitude.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام