ReFineG: Synergizing Small Supervised Models and LLMs for Low-Resource Grounded Multimodal NER
Jielong Tang, Shuang Wang, Zhenxing Wang, Jianxing Yu, Jian Yin
https://arxiv.org/abs/2509.10975
Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention
Xin Zou, Di Lu, Yizhou Wang, Yibo Yan, Yuanhuiyi Lyu, Xu Zheng, Linfeng Zhang, Xuming Hu
https://arxiv.org/abs/2510.02912
Stack Overflow Is Not Dead Yet: Crowd Answers Still Matter
Denis Helic, Tiago Santos
https://arxiv.org/abs/2509.05879 https://arxiv.org/pdf/2509.05879
Aligning Effective Tokens with Video Anomaly in Large Language Models
Yingxian Chen, Jiahui Liu, Ruifan Di, Yanwei Li, Chirui Chang, Shizhen Zhao, Wilton W. T. Fok, Xiaojuan Qi, Yik-Chung Wu
https://arxiv.org/abs/2508.06350
A Multi-To-One Interview Paradigm for Efficient MLLM Evaluation
Ye Shen, Junying Wang, Farong Wen, Yijin Guo, Qi Jia, Zicheng Zhang, Guangtao Zhai
https://arxiv.org/abs/2509.14886
Less is More: Token-Efficient Video-QA via Adaptive Frame-Pruning and Semantic Graph Integration
Shaoguang Wang (The Hong Kong University of Science and Technology), Jianxiang He (The Hong Kong University of Science and Technology), Yijie Xu (The Hong Kong University of Science and Technology), Ziyang Chen (The Hong Kong University of Science and Technology), Weiyu Guo (The Hong Kong University of Science and Technology), Hui Xiong (The Hong Kong University of Science and Technology)