ﻻ يوجد ملخص باللغة العربية
In this paper, we propose a lightweight music-generating model based on variational autoencoder (VAE) with structured attention. Generating music is different from generating text because the melodies with chords give listeners distinguished polyphonic feelings. In a piece of music, a chord consisting of multiple notes comes from either the mixture of multiple instruments or the combination of multiple keys of a single instrument. We focus our study on the latter. Our model captures not only the temporal relations along time but the structure relations between keys. Experimental results show that our model has a better performance than baseline MusicVAE in capturing notes in a chord. Besides, our method accords with music theory since it maintains the configuration of the circle of fifths, distinguishes major and minor keys from interval vectors, and manifests meaningful structures between music phrases.
Objective evaluation (OE) is essential to artificial music, but its often very hard to determine the quality of OEs. Hitherto, subjective evaluation (SE) remains reliable and prevailing but suffers inevitable disadvantages that OEs may overcome. Ther
Music creation is typically composed of two parts: composing the musical score, and then performing the score with instruments to make sounds. While recent work has made much progress in automatic music generation in the symbolic domain, few attempts
Dance and music typically go hand in hand. The complexities in dance, music, and their synchronisation make them fascinating to study from a computational creativity perspective. While several works have looked at generating dance for a given music,
Automatic melody generation for pop music has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melody has turned out to be highly challenging due to a number of factors. Representation of mul
In this paper, we address the text-to-audio grounding issue, namely, grounding the segments of the sound event described by a natural language query in the untrimmed audio. This is a newly proposed but challenging audio-language task, since it requir