Bridging the Gap Between Semantic and User Preference Spaces for Multi-modal Music Representation LearningXiaofeng Pan, Jing Chen, Haitong Zhang, Menglin Xing, Jiayi Wei, Xuefeng Mu, Zhongqian Xiehttps://arxiv.org/abs/2505.23298
Bridging the Gap Between Semantic and User Preference Spaces for Multi-modal Music Representation LearningRecent works of music representation learning mainly focus on learning acoustic music representations with unlabeled audios or further attempt to acquire multi-modal music representations with scarce annotated audio-text pairs. They either ignore the language semantics or rely on labeled audio datasets that are difficult and expensive to create. Moreover, merely modeling semantic space usually fails to achieve satisfactory performance on music recommendation tasks since the user preference spac…