No Arabic abstract
The musical realm is a promising area in which to expect to find nontrivial topological structures. This paper describes several kinds of metrics on musical data, and explores the implications of these metrics in two ways: via techniques of classical topology where the metric space of all-possible musical data can be described explicitly, and via modern data-driven ideas of persistent homology which calculates the Betti-number bar-codes of individual musical works. Both analyses are able to recover three well known topological structures in music: the circle of notes (octave-reduced scalar structures), the circle of fifths, and the rhythmic repetition of timelines. Applications to a variety of musical works (for example, folk music in the form of standard MIDI files) are presented, and the bar codes show many interesting features. Examples show that individual pieces may span the complete space (in which case the classical and the data-driven analyses agree), or they may span only part of the space.
Natural data offer a hard challenge to data analysis. One set of tools is being developed by several teams to face this difficult task: Persistent topology. After a brief introduction to this theory, some applications to the analysis and classification of cells, lesions, music pieces, gait, oil and gas reservoirs, cyclones, galaxies, bones, brain connections, languages, handwritten and gestured letters are shown.
The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy. Artist labels are less subjective and less noisy, while certain artists may relate more strongly to certain genres. At the same time, at prediction time, it is not guaranteed that artist labels are available for a given audio segment. Therefore, in this work, we propose to apply the transfer learning framework, learning artist-related information which will be used at inference time for genre classification. We consider different types of artist-related information, expressed through artist group factors, which will allow for more efficient learning and stronger robustness to potential label noise. Furthermore, we investigate how to achieve the highest validation accuracy on the given FMA dataset, by experimenting with various kinds of transfer methods, including single-task transfer, multi-task transfer and finally multi-task learning.
Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the neural-source-filter (NSF) model have shown good performance in speech synthesis despite their different methods of waveform generation. The similarity between speech and music audio synthesis techniques suggests interesting avenues to explore in terms of the best way to apply speech synthesizers in the music domain. This work compares three neural synthesizers used for musical instrument sounds generation under three scenarios: training from scratch on music data, zero-shot learning from the speech domain, and fine-tuning-based adaptation from the speech to the music domain. The results of a large-scale perceptual test demonstrated that the performance of three synthesizers improved when they were pre-trained on speech data and fine-tuned on music data, which indicates the usefulness of knowledge from speech data for music audio generation. Among the synthesizers, WaveGlow showed the best potential in zero-shot learning while NSF performed best in the other scenarios and could generate samples that were perceptually close to natural audio.
Let G be a compact Lie group. By work of Chataur and Menichi, the homology of the space of free loops in the classifying space of G is known to be the value on the circle in a homological conformal field theory. This means in particular that it admits operations parameterized by homology classes of classifying spaces of diffeomorphism groups of surfaces. Here we present a radical extension of this result, giving a new construction in which diffeomorphisms are replaced with homotopy equivalences, and surfaces with boundary are replaced with arbitrary spaces homotopy equivalent to finite graphs. The result is a novel kind of field theory which is related to both the diffeomorphism groups of surfaces and the automorphism groups of free groups with boundaries. Our work shows that the algebraic structures in string topology of classifying spaces can be brought into line with, and in fact far exceed, those available in string topology of manifolds. For simplicity, we restrict to the characteristic 2 case. The generalization to arbitrary characteristic will be addressed in a subsequent paper.
Timbre representations of musical instruments, essential for diverse applications such as musical audio synthesis and separation, might be learned as bottleneck features from an instrumental recognition model. Given the similarities between speaker recognition and musical instrument recognition, in this paper, we investigate how to adapt successful speaker recognition algorithms to musical instrument recognition to learn meaningful instrumental timbre representations. To address the mismatch between musical audio and models devised for speech, we introduce a group of trainable filters to generate proper acoustic features from input raw waveforms, making it easier for a model to be optimized in an input-agnostic and end-to-end manner. Through experiments on both the NSynth and RWC databases in both musical instrument closed-set identification and open-set verification scenarios, the modified speaker recognition model was capable of generating discriminative embeddings for instrument and instrument-family identities. We further conducted extensive experiments to characterize the encoded information in learned timbre embeddings.