MAGIC-Enhanced Keyword Prompting for Zero-Shot Audio Captioning with CLIP Models
Vijay Govindarajan, Pratik Patel, Sahil Tripathi, Md Azizul Hoque, Gautam Siddharth Kashyap
https://arxiv.org/abs/2509.12591
Generalizable Geometric Image Caption Synthesis
Yue Xin, Wenyuan Wang, Rui Pan, Ruida Wang, Howard Meng, Renjie Pi, Shizhe Diao, Tong Zhang
https://arxiv.org/abs/2509.15217 http…
Aligning Audio Captions with Human Preferences
Kartik Hegde, Rehana Mahfuz, Yinyi Guo, Erik Visser
https://arxiv.org/abs/2509.14659 https://arxiv.org/pdf/2…
I finished watching this #Netflix documentary about the #Antwerp 🇧🇪 diamond heist in 2003 and while the film itself is your typical interesting and informative Netflix-style docu, I am again peeved that #OpenStreetMap
What's the Best Way to Retrieve Slides? A Comparative Study of Multimodal, Caption-Based, and Hybrid Retrieval Techniques
Petros Stylianos Giouroukis, Dimitris Dimitriadis, Dimitrios Papadopoulos, Zhenwen Shao, Grigorios Tsoumakas
https://arxiv.org/abs/2509.15211
🇺🇦 Auf radioeins läuft...
Cupidon feat. Milaa:
🎵 Feel It
#NowPlaying #Cupidon #Milaa
https://subreachers.bandcamp.com/track/cupidon-milaa-feel-it-subreachers-jungle-bootleg
https://open.spotify.com/track/5qKEArG4jwMAYTAsuG4rpH
The Call of the Wild
#Sims4 #TheSims4 #Comicstrip
It's funny because when I captioned NPR's audio I realised the transcript on their website is wrong.
David Simon does not say:
"You mentioned that."
He said:
"You imagine that?"
As NPR likely use "AI" for their transcripts, this adds an extra layer of irony.