Extended pipeline for content-based feature engineering in music genre recognition

104 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Paolo Bientinesi

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Tina Raissi

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a feature engineering pipeline for the construction of musical signal characteristics, to be used for the design of a supervised model for musical genre identification. The key idea is to extend the traditional two-step process of extraction and classification with additive stand-alone phases which are no longer organized in a waterfall scheme. The whole system is realized by traversing backtrack arrows and cycles between various stages. In order to give a compact and effective representation of the features, the standard early temporal integration is combined with other selection and extraction phases: on the one hand, the selection of the most meaningful characteristics based on information gain, and on the other hand, the inclusion of the nonlinear correlation between this subset of features, determined by an autoencoder. The results of the experiments conducted on GTZAN dataset reveal a noticeable contribution of this methodology towards the models performance in classification task.

قيم البحث

179 - Kristen Masada , Razvan Bunescu 2018

We present a new approach to harmonic analysis that is trained to segment music into a sequence of chord spans tagged with chord labels. Formulated as a semi-Markov Conditional Random Field (semi-CRF), this joint segmentation and labeling approach en ables the use of a rich set of segment-level features, such as segment purity and chord coverage, that capture the extent to which the events in an entire segment of music are compatible with a candidate chord label. The new chord recognition model is evaluated extensively on three corpora of classical music and a newly created corpus of rock music. Experimental results show that the semi-CRF model performs substantially better than previous approaches when trained on a sufficient number of labeled examples and remains competitive when the amount of training data is limited.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model

224 - Daniel Stoller , Simon Durand , Sebastian Ewert 2019

Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training required fine-grained annotations to be available in some form. Here, we present a novel system based on a modified Wave-U-Net architecture, which predicts character probabilities directly from raw audio using learnt multi-scale representations of the various signal components. There are no sub-modules whose interdependencies need to be optimized. Our training procedure is designed to work with weak, line-level annotations available in the real world. With a mean alignment error of 0.35s on a standard dataset our system outperforms the state-of-the-art by an order of magnitude.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

Feature-based Representation for Violin Bridge Admittances

109 - R. Malvermi , S. Gonzalez , M. Quintavalla 2021

Frequency Response Functions (FRFs) are one of the cornerstones of musical acoustic experimental research. They describe the way in which musical instruments vibrate in a wide range of frequencies and are used to predict and understand the acoustic d ifferences between them. In the specific case of stringed musical instruments such as violins, FRFs evaluated at the bridge are known to capture the overall body vibration. These indicators, also called bridge admittances, are widely used in the literature for comparative analyses. However, due to their complex structure they are rather difficult to quantitatively compare and study. In this manuscript we present a way to quantify differences between FRFs, in particular violin bridge admittances, that separates the effects in frequency, amplitude and quality factor of the first resonance peaks characterizing the responses. This approach allows us to define a distance between FRFs and clusterise measurements according to this distance. We use two case studies, one based on Finite Element Analysis and another exploiting measurements on real violins, to prove the effectiveness of such representation. In particular, for simulated bridge admittances the proposed distance is able to highlight the different impact of consecutive simulation `steps on specific vibrational properties and, for real violins, gives a first insight on similar styles of making, as well as opposite ones.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

Improving DNN-based Music Source Separation using Phase Features

91 - Joachim Muth , Stefan Uhlich , Nathanael Perraudin 2018

Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT phase and ampl itude, we conjecture that derivatives of the phase are a good feature representation opposed to the raw phase. We verify this conjecture experimentally and propose a new DNN architecture which combines amplitude and phase. This joint approach achieves a better signal-to distortion ratio on the DSD100 dataset for all instruments compared to a network that uses only amplitude features. Especially, the bass instrument benefits from the phase information.

أنظمة الصوت في الحاسوب التعلم الآلي معالجة الصوت والكلام

Now Playing: Continuous low-power music recognition

148 - Blaise Aguera y Arcas , Beat Gfeller , Ruiqi Guo 2017

Existing music recognition applications require a connection to a server that performs the actual recognition. In this paper we present a low-power music recognizer that runs entirely on a mobile device and automatically recognizes music without user interaction. To reduce battery consumption, a small music detector runs continuously on the mobile devices DSP chip and wakes up the main application processor only when it is confident that music is present. Once woken, the recognizer on the application processor is provided with a few seconds of audio which is fingerprinted and compared to the stored fingerprints in the on-device fingerprint database of tens of thousands of songs. Our presented system, Now Playing, has a daily battery usage of less than 1% on average, respects user privacy by running entirely on-device and can passively recognize a wide range of music.

أنظمة الصوت في الحاسوب الذكاء الاصطناعي معالجة الصوت والكلام