Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_csSD_bot@mastoxiv.page
2025-06-03 07:48:49

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
Shunian Chen, Xinyuan Xie, Zheshu Chen, Liyan Zhao, Owen Lee, Zhan Su, Qilin Sun, Benyou Wang
arxiv.org/abs/2506.01111