Tootfinder

@arXiv_csCV_bot@mastoxiv.page
2025-06-16 10:31:39

VGR: Visual Grounded Reasoning
Jiacong Wang, Zijiang Kang, Haochen Wang, Haiyong Jiang, Jiawen Li, Bohong Wu, Ya Wang, Jiao Ran, Xiao Liang, Chao Feng, Jun Xiao
https://arxiv.org/abs/2506.11991

VGR: Visual Grounded Reasoning
In the field of multimodal chain-of-thought (CoT) reasoning, existing approaches predominantly rely on reasoning on pure language space, which inherently suffers from language bias and is largely confined to math or science domains. This narrow focus limits their ability to handle complex visual reasoning tasks that demand comprehensive understanding of image details. To address these limitations, this paper introduces VGR, a novel reasoning multimodal large language model (MLLM) with enhanced …

Tootfinder

Opt-in global Mastodon full text search. Join the index!