Tootfinder

Opt-in global Mastodon full text search. Join the index!

@Mediagazer@mstdn.social
2025-06-22 14:40:37

XR Extreme Reach: in 2024, only 9% of TV ads had closed captions and 1% had audio descriptions, despite over half of adults watching content with captions on (TheDesk.net)
thedesk.net/2025/06/xr-report-

@EarthOrgUK@mastodon.energy
2025-07-22 19:51:03

On Website Technicals (2025-06) - Tech updates: Junited - Rigby to Buttersafe - GPTBot badness, captions, diversion delay, under-volt, X11 fossil. #Junited2025 - earth.org.uk/note-on-site-tech

@arXiv_csCV_bot@mastoxiv.page
2025-06-25 10:32:00

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing
Long Xing, Qidong Huang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Jinsong Li, Shuangrui Ding, Weiming Zhang, Nenghai Yu, Jiaqi Wang, Feng Wu, Dahua Lin
arxiv.org/abs/2506.19848

@aardrian@toot.cafe
2025-07-22 20:43:16

Reason #2608 I do not trust “AI” to generate captions or transcripts:
“Complete silence is always hallucinated as 'ترجمة نانسي قنقر' in Arabic which translates as 'Translation by Nancy Qunqar'”
More examples in replies.
#a11y #accessibility

@arXiv_csCV_bot@mastoxiv.page
2025-07-25 10:21:02

SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning
Si-Woo Kim, MinJu Jeon, Ye-Chan Kim, Soeun Lee, Taewhan Kim, Dong-Jin Kim
arxiv.org/abs/2507.18616

@arXiv_eessAS_bot@mastoxiv.page
2025-07-24 08:00:59

Towards Robust Speech Recognition for Jamaican Patois Music Transcription
Jordan Madden, Matthew Stone, Dimitri Johnson, Daniel Geddez
arxiv.org/abs/2507.16834

@EarthOrgUK@mastodon.energy
2025-06-20 03:23:03

On Website Technicals (2025-06) - Tech updates: Junited - Rigby to DoA - GPTBot badness, captions, diversion delay... #Junited2025 - earth.org.uk/note-on-site-tech

@arXiv_csCV_bot@mastoxiv.page
2025-07-23 10:31:22

Enhancing Remote Sensing Vision-Language Models Through MLLM and LLM-Based High-Quality Image-Text Dataset Generation
Yiguo He, Junjie Zhu, Yiying Li, Xiaoyu Zhang, Chunping Qiu, Jun Wang, Qiangjuan Huang, Ke Yang
arxiv.org/abs/2507.16716

@v_i_o_l_a@openbiblio.social
2025-06-15 17:14:39

"New File Format Research and Documentation on the Sustainability of Digital Formats" | The Signal blogs.loc.gov/thesignal/2025/0

@arXiv_csSD_bot@mastoxiv.page
2025-06-19 08:35:48

SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning
Anuradha Chopra, Abhinaba Roy, Dorien Herremans
arxiv.org/abs/2506.15154

@sean@scoat.es
2025-06-12 19:52:10

Maybe this makes me sound old, but this trend of putting captions (worse: with the current word highlighted) on top of video data, as video data—especially on a platform that has a caption system, like YouTube—is really painful.

@UP8@mastodon.social
2025-08-05 14:10:56

🤯 Interpretable EEG-to-Image Generation with Semantic Prompts
#eeg #ai

@davidaugust@mastodon.online
2025-08-06 17:55:39

#USpol

screenshot of a post by Thomas Massie @RepThomasMassie:   A meme featuring two panels with captions.   In the top panel, there is a scene from a movie showing a driver looking shocked inside a vehicle; caption reads: "Democrats leaving Texas to protect their district."  In the bottom panel, there is an image of speaker of the house johnson driving the other way, looking out from a vehicle; caption reads: "Republicans leaving D.C. to protect the Epstein files."  Aug 6, 2025 1:54pm UTC
@DamonHD@mastodon.social
2025-06-09 20:32:30

@… Thank you for your captions code for HTML5 audio objects - I have it working for my podcast episodes now, such as earth.org.uk/Why-Do-Startups.h

@EarthOrgUK@mastodon.energy
2025-07-13 19:51:03

On Website Technicals (2025-06) - Tech updates: Junited - Rigby to Buttersafe - GPTBot badness, captions, diversion delay, under-volt, X11 fossil. #Junited2025 - earth.org.uk/note-on-site-tech

@arXiv_csSD_bot@mastoxiv.page
2025-08-07 08:29:24

MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel, Gang Li, Jizhong Liu, Jian Luan, Yadong Niu, Xingwei Sun, Tianzi Wang, Qiyang Xiao, Junbo Zhang, Jiahao Zhou
arxiv.org/abs/2508.03983

@arXiv_eessIV_bot@mastoxiv.page
2025-06-10 17:13:39

This arxiv.org/abs/2505.12887 has been replaced.
initial toot: mastoxiv.page/@arXiv_ees…

@arXiv_csCL_bot@mastoxiv.page
2025-07-08 13:42:11

Think Twice Before You Judge: Mixture of Dual Reasoning Experts for Multimodal Sarcasm Detection
Soumyadeep Jana, Abhrajyoti Kundu, Sanasam Ranbir Singh
arxiv.org/abs/2507.04458

@arXiv_csIR_bot@mastoxiv.page
2025-06-30 09:15:30

Evaluating VisualRAG: Quantifying Cross-Modal Performance in Enterprise Document Understanding
Varun Mannam, Fang Wang, Xin Chen
arxiv.org/abs/2506.21604

@EarthOrgUK@mastodon.energy
2025-06-29 19:51:04

On Website Technicals (2025-06) - Tech updates: Junited - Rigby to OEM - GPTBot badness, captions, diversion delay, under-volt, X11 fossil... #Junited2025 - earth.org.uk/note-on-site-tech

@arXiv_csCV_bot@mastoxiv.page
2025-07-14 10:03:32

ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way
Rajarshi Roy, Devleena Das, Ankesh Banerjee, Arjya Bhattacharjee, Kousik Dasgupta, Subarna Tripathi
arxiv.org/abs/2507.08679

@EarthOrgUK@mastodon.energy
2025-06-30 03:23:03

On Website Technicals (2025-06) - Tech updates: Junited - Rigby to OEM - GPTBot badness, captions, diversion delay, under-volt, X11 fossil... #Junited2025 - earth.org.uk/note-on-site-tech

@arXiv_csCV_bot@mastoxiv.page
2025-07-29 12:16:11

Learning Transferable Facial Emotion Representations from Large-Scale Semantically Rich Captions
Licai Sun, Xingxun Jiang, Haoyu Chen, Yante Li, Zheng Lian, Biu Liu, Yuan Zong, Wenming Zheng, Jukka M. Lepp\"anen, Guoying Zhao
arxiv.org/abs/2507.21015

@arXiv_eessAS_bot@mastoxiv.page
2025-06-13 07:58:50

AC/DC: LLM-based Audio Comprehension via Dialogue Continuation
Yusuke Fujita, Tomoya Mizumoto, Atsushi Kojima, Lianbo Liu, Yui Sudo
arxiv.org/abs/2506.10312

@EarthOrgUK@mastodon.energy
2025-06-26 19:51:02

On Website Technicals (2025-06) - Tech updates: Junited - Rigby to Unmet Hours - GPTBot badness, captions, diversion delay, under-volt... #Junited2025 - earth.org.uk/note-on-site-tech

@arXiv_csCV_bot@mastoxiv.page
2025-07-28 10:14:11

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
Yusuke Hirota, Boyi Li, Ryo Hachiuma, Yueh-Hua Wu, Boris Ivanovic, Yuta Nakashima, Marco Pavone, Yejin Choi, Yu-Chiang Frank Wang, Chao-Han Huck Yang
arxiv.org/abs/2507.19362

@arXiv_csSD_bot@mastoxiv.page
2025-06-03 07:48:49

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
Shunian Chen, Xinyuan Xie, Zheshu Chen, Liyan Zhao, Owen Lee, Zhan Su, Qilin Sun, Benyou Wang
arxiv.org/abs/2506.01111

@arXiv_csCV_bot@mastoxiv.page
2025-07-10 10:15:41

GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning
S M Taslim Uddin Raju, Md. Milon Islam, Md Rezwanul Haque, Hamdi Altaheri, Fakhri Karray
arxiv.org/abs/2507.07006

@arXiv_csCV_bot@mastoxiv.page
2025-06-04 07:58:38

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
Wei Chow, Yuan Gao, Linfeng Li, Xian Wang, Qi Xu, Hang Song, Lingdong Kong, Ran Zhou, Yi Zeng, Yidong Cai, Botian Jiang, Shilin Xu, Jiajun Zhang, Minghui Qiu, Xiangtai Li, Tianshu Yang, Siliang Tang, Juncheng Li
arxiv.org/abs/2506.03144