
2025-09-15 09:55:31
Towards Reliable and Interpretable Document Question Answering via VLMs
Alessio Chen, Simone Giovannini, Andrea Gemelli, Fabio Coppini, Simone Marinai
https://arxiv.org/abs/2509.10129
Towards Reliable and Interpretable Document Question Answering via VLMs
Alessio Chen, Simone Giovannini, Andrea Gemelli, Fabio Coppini, Simone Marinai
https://arxiv.org/abs/2509.10129
🧩 Returns structured output as hierarchical JSON plus ready-to-render #Markdown format
👁️ Optional bounding-box snippets & f
ull-page visualizations for ground-truth verification
Prompt learning with bounding box constraints for medical image segmentation
M\'elanie Gaillochet, Mehrdad Noori, Sahar Dastani, Christian Desrosiers, Herv\'e Lombaert
https://arxiv.org/abs/2507.02743
Visual Prompting for Robotic Manipulation with Annotation-Guided Pick-and-Place Using ACT
Muhammad A. Muttaqien, Tomohiro Motoda, Ryo Hanai, Yukiyasu Domae
https://arxiv.org/abs/2508.08748
DOBB-BVH: Efficient Ray Traversal by Transforming Wide BVHs into Oriented Bounding Box Trees using Discrete Rotations
Michael A. Kern, Alain Galvan, David Oldcorn, Daniel Skinner, Rohan Mehalwal, Leo Reyes Lozano, Matth\"aus G. Chajdas
https://arxiv.org/abs/2506.22849
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
Haochen Wang, Xiangtai Li, Zilong Huang, Anran Wang, Jiacong Wang, Tao Zhang, Jiani Zheng, Sule Bai, Zijian Kang, Jiashi Feng, Zhuochen Wang, Zhaoxiang Zhang
https://arxiv.org/abs/2507.07999
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[2/5]:
- OralBBNet: Spatially Guided Dental Segmentation of Panoramic X-Rays with Bounding Box Priors
Budagam, Imanbayev, Akhmetov, Sinitca, Antonov, Kaplun
New #rstats https://github.com/e-kotov/gridmaker Creates Eurostat GISCO compatible and INSPIRE-compliant grids with IDs that look like ‘CRS3035RES1000mN3497000E4448000’ or ‘1kmN3497E4447’. Input can be sf, or …
Collaborative Charging Scheduling via Balanced Bounding Box Methods
Fangting Zhou, Bal\'azs Kulcs\'ar, Jiaming Wu
https://arxiv.org/abs/2506.14461 …
Evaluating Integrative Strategies for Incorporating Phenotypic Features in Spatial Transcriptomics
Levin M Moser, Ahmad Kamal Hamid, Esteban Miglietta, Nodar Gogoberidze, Beth A Cimini
https://arxiv.org/abs/2507.22212
VisioFirm: Cross-Platform AI-assisted Annotation Tool for Computer Vision
Safouane El Ghazouali, Umberto Michelucci
https://arxiv.org/abs/2509.04180 https://
Model-based Multi-object Visual Tracking: Identification and Standard Model Limitations
Jan Krej\v{c}\'i, Oliver Kost, Yuxuan Xia, Lennart Svensson, Ond\v{r}ej Straka
https://arxiv.org/abs/2508.13647
Partial Weakly-Supervised Oriented Object Detection
Mingxin Liu, Peiyuan Zhang, Yuan Liu, Wei Zhang, Yue Zhou, Ning Liao, Ziyang Gong, Junwei Luo, Zhirui Wang, Yi Yu, Xue Yang
https://arxiv.org/abs/2507.02751
V2P: From Background Suppression to Center Peaking for Robust GUI Grounding Task
Jikai Chen, Long Chen, Dong Wang, Leilei Gan, Chenyi Zhuang, Jinjie Gu
https://arxiv.org/abs/2508.13634
Perspective-Invariant 3D Object Detection
Ao Liang, Lingdong Kong, Dongyue Lu, Youquan Liu, Jian Fang, Huaici Zhao, Wei Tsang Ooi
https://arxiv.org/abs/2507.17665 https://