Towards Playlist Generation Algorithms Using RNNs Trained on Within-Track Transitions

78 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Keunwoo Choi Mr

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Keunwoo Choi - - George Fazekas

الذكاء الاصطناعي الوسائط المتعددة أنظمة الصوت في الحاسوب

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We introduce a novel playlist generation algorithm that focuses on the quality of transitions using a recurrent neural network (RNN). The proposed model assumes that optimal transitions between tracks can be modelled and predicted by internal transitions within music tracks. We introduce modelling sequences of high-level music descriptors using RNNs and discuss an experiment involving different similarity functions, where the sequences are provided by a musical structural analysis algorithm. Qualitative observations show that the proposed approach can effectively model transitions of music tracks in playlists.

قيم البحث

69 - Jongpil Lee , Jiyoung Park , Juhan Nam 2019

Supervised music representation learning has been performed mainly using semantic labels such as music genres. However, annotating music with semantic labels requires time and cost. In this work, we investigate the use of factual metadata such as art ist, album, and track information, which are naturally annotated to songs, for supervised music representation learning. The results show that each of the metadata has individual concept characteristics, and using them jointly improves overall performance.

استرجاع المعلومات الوسائط المتعددة أنظمة الصوت في الحاسوب

Towards Music Captioning: Generating Music Playlist Descriptions

73 - Keunwoo Choi , George Fazekas , Brian McFee 2016

Descriptions are often provided along with recommendations to help users discovery. Recommending automatically generated music playlists (e.g. personalised playlists) introduces the problem of generating descriptions. In this paper, we propose a meth od for generating music playlist descriptions, which is called as music captioning. In the proposed method, audio content analysis and natural language processing are adopted to utilise the information of each track.

الوسائط المتعددة الذكاء الاصطناعي الحساب واللغة

Improved TDNNs using Deep Kernels and Frequency Dependent Grid-RNNs

104 - Florian Kreyssig , Chao Zhang , Philip Woodland 2018

Time delay neural networks (TDNNs) are an effective acoustic model for large vocabulary speech recognition. The strength of the model can be attributed to its ability to effectively model long temporal contexts. However, current TDNN models are relat ively shallow, which limits the modelling capability. This paper proposes a method of increasing the network depth by deepening the kernel used in the TDNN temporal convolutions. The best performing kernel consists of three fully connected layers with a residual (ResNet) connection from the output of the first to the output of the third. The addition of spectro-temporal processing as the input to the TDNN in the form of a convolutional neural network (CNN) and a newly designed Grid-RNN was investigated. The Grid-RNN strongly outperforms a CNN if different sets of parameters for different frequency bands are used and can be further enhanced by using a bi-directional Grid-RNN. Experiments using the multi-genre broadcast (MGB3) English data (275h) show that deep kernel TDNNs reduces the word error rate (WER) by 6% relative and when combined with the frequency dependent Grid-RNN gives a relative WER reduction of 9%.

الحساب واللغة الذكاء الاصطناعي أنظمة الصوت في الحاسوب

Dual-track Music Generation using Deep Learning

210 - Sudi Lyu , Anxiang Zhang , Rong Song 2020

Music generation is always interesting in a sense that there is no formalized recipe. In this work, we propose a novel dual-track architecture for generating classical piano music, which is able to model the inter-dependency of left-hand and right-ha nd piano music. Particularly, we experimented with a lot of different models of neural network as well as different representations of music, and the results show that our proposed model outperforms all other tested methods. Besides, we deployed some special policies for model training and generation, which contributed to the model performance remarkably. Finally, under two evaluation methods, we compared our models with the MuseGAN project and true music.

أنظمة الصوت في الحاسوب التعلم الآلي التعلم الالي

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

210 - Hang Zhou , Xudong Xu , Dahua Lin 2020

Stereophonic audio is an indispensable ingredient to enhance human auditory experience. Recent research has explored the usage of visual information as guidance to generate binaural or ambisonic audio from mono ones with stereo supervision. However, this fully supervised paradigm suffers from an inherent drawback: the recording of stereophonic audio usually requires delicate devices that are expensive for wide accessibility. To overcome this challenge, we propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio. Our key observation is that the task of visually indicated audio separation also maps independent audios to their corresponding visual positions, which shares a similar objective with stereophonic audio generation. We integrate both stereo generation and source separation into a unified framework, Sep-Stereo, by considering source separation as a particular type of audio spatialization. Specifically, a novel associative pyramid network architecture is carefully designed for audio-visual feature fusion. Extensive experiments demonstrate that our framework can improve the stereophonic audio generation results while performing accurate sound separation with a shared backbone.

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة أنظمة الصوت في الحاسوب