ﻻ يوجد ملخص باللغة العربية
This paper thoroughly analyses the effect of different input representations on polyphonic multi-instrument music transcription. We use our own GPU based spectrogram extraction tool, nnAudio, to investigate the influence of using a linear-frequency spectrogram, log-frequency spectrogram, Mel spectrogram, and constant-Q transform (CQT). Our results show that a $8.33$% increase in transcription accuracy and a $9.39$% reduction in error can be obtained by choosing the appropriate input representation (log-frequency spectrogram with STFT window length 4,096 and 2,048 frequency bins in the spectrogram) without changing the neural network design (single layer fully connected). Our experiments also show that Mel spectrogram is a compact representation for which we can reduce the number of frequency bins to only 512 while still keeping a relatively high music transcription accuracy.
In recent years, music source separation has been one of the most intensively studied research areas in music information retrieval. Improvements in deep learning lead to a big progress in music source separation performance. However, most of the pre
We present in this paper PerformacnceNet, a neural network model we proposed recently to achieve score-to-audio music generation. The model learns to convert a music piece from the symbolic domain to the audio domain, assigning performance-level attr
Recent studies have shown that neural vocoders based on generative adversarial network (GAN) can generate audios with high quality. While GAN based neural vocoders have shown to be computationally much more efficient than those based on autoregressiv
Personalized recommendation on new track releases has always been a challenging problem in the music industry. To combat this problem, we first explore user listening history and demographics to construct a user embedding representing the users music
In this paper, we adapt triplet neural networks (TNNs) to a regression task, music emotion prediction. Since TNNs were initially introduced for classification, and not for regression, we propose a mechanism that allows them to provide meaningful low