Rethinking movie genre classification with fine-grained semantic clustering


الملخص بالإنكليزية

Movie genre classification is an active research area in machine learning. However, due to the limited labels available, there can be large semantic variations between movies within a single genre definition. We expand these coarse genre labels by identifying fine-grained semantic information within the multi-modal content of movies. By leveraging pre-trained expert networks, we learn the influence of different combinations of modes for multi-label genre classification. Using a contrastive loss, we continue to fine-tune this coarse genre classification network to identify high-level intertextual similarities between the movies across all genre labels. This leads to a more fine-grained and detailed clustering, based on semantic similarities while still retaining some genre information. Our approach is demonstrated on a newly introduced multi-modal 37,866,450 frame, 8,800 movie trailer dataset, MMX-Trailer-20, which includes pre-computed audio, location, motion, and image embeddings.

تحميل البحث