Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/4]:
- Evaluating the Robustness of Open-Source Vision-Language Models to Domain Shift in Object Captioning
Federico Tavella, Amber Drinkwater, Angelo Cangelosi
Automating Code Generation for Semiconductor Equipment Control from Developer Utterances with LLMs
Youngkyoung Kim, Sanghyeok Park, Misoo Kim, Gangho Yoon, Eunseok Lee, Simon S. Woo
https://arxiv.org/abs/2509.13055
Reverse Physician-AI Relationship: Full-process Clinical Diagnosis Driven by a Large Language Model
Shicheng Xu, Xin Huang, Zihao Wei, Liang Pang, Huawei Shen, Xueqi Cheng
https://arxiv.org/abs/2508.10492
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/4]:
- IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model
Jiang, Gao, Wang, Sun, Wang, Heng, Sun, Tang, Zhu, Chai, Wang, Gu, Jiang, Sun
Large Language Models Show Signs of Alignment with Human Neurocognition During Abstract Reasoning
Christopher Pinier, Sonia Acu\~na Vargas, Mariia Steeghs-Turchina, Dora Matzke, Claire E. Stevenson, Michael D. Nunez
https://arxiv.org/abs/2508.10057
Finally, what Xia & Lindell call a "separation problem" is, in our view, a feature of our approach and not a bug.
If, e.g., all languages in a family are polysynthetic (or none are), that’s not a statistical artefact – it’s the signal. The outcome is well associated with genealogy, showing that family membership captures someth genuinely informative about the process. When the model finds that family explains a large share of the variance, that's not a failure–it's evidence that phylogenetic structure dominates the pattern.
So while Xia & Lindell insist that "autocorrelation due to relationships and distance cannot be captured in family or regional-level analyses", we see that as an empirical question – and we treated it as one.
The real test is whether a mixed model that explicitly represents phylogeny and geography performs worse than their alternative, where the entire shared history of languages and environments is effectively collapsed into a single dimension (an eigenvector).
In other words: we model relationships – Xia & Lindell summarise them into one number per language.
Poseidon: A OneGraph Engine
Brad Bebee, \"Umit V. \c{C}ataly\"urek, Olaf Hartig, Ankesh Khandelwal, Simone Rondelli, Michael Schmidt, Lefteris Sidirourgos, Bryan Thompson
https://arxiv.org/abs/2510.11166
Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations
Ron F. Del Rosario, Klaudia Krawiecka, Christian Schroeder de Witt
https://arxiv.org/abs/2509.08646
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/5]:
- Video-based Sign Language Recognition without Temporal Segmentation
Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, Weiping Li
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Yi Han, Cheng Chi, Enshen Zhou, Shanyu Rong, Jingkun An, Pengwei Wang, Zhongyuan Wang, Lu Sheng, Shanghang Zhang
https://arxiv.org/abs/2510.07181
Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents
Sankalp Tattwadarshi Swain, Anshika Krishnatray, Dhruv Kumar, Jagat Sesh Challa
https://arxiv.org/abs/2509.07389
Outsmarting Linear Neural Networks via an Incoherent Light-Driven Optical Extreme Learner with Data Reverberation
Bofeng Liu, Xu Mei, Sadman Shafi, Tunan Xia, Iam-Choon Khoo, Zhiwen Liu, Xingjie Ni
https://arxiv.org/abs/2508.08428
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/5]:
- SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial...
Ziyang Gong, Wenhao Li, Oliver Ma, Songyuan Li, Jiayi Ji, Xue Yang, Gen Luo, Junchi Yan, Rongrong Ji
Understanding Economic Tradeoffs Between Human and AI Agents in Bargaining Games
Crystal Qian, Kehang Zhu, John Horton, Benjamin S. Manning, Vivian Tsai, James Wexler, Nithum Thain
https://arxiv.org/abs/2509.09071
Hidden in the Noise: Unveiling Backdoors in Audio LLMs Alignment through Latent Acoustic Pattern Triggers
Liang Lin, Miao Yu, Kaiwen Luo, Yibo Zhang, Lilan Peng, Dexian Wang, Xuehai Tang, Yuanhe Zhang, Xikang Yang, Zhenhong Zhou, Kun Wang, Yang Liu
https://arxiv.org/abs/2508.02175
Convergence and Divergence of Language Models under Different Random Seeds
Finlay Fehlauer (ETH Zurich), Kyle Mahowald (University of Texas at Austin), Tiago Pimentel (ETH Zurich)
https://arxiv.org/abs/2509.26643
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/5]:
- TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Han, Chi, Zhou, Rong, An, Wang, Wang, Sheng, Zhang
Diagnostic-Guided Dynamic Profile Optimization for LLM-based User Simulators in Sequential Recommendation
Hongyang Liu, Zhu Sun, Tianjun Wei, Yan Wang, Jiajie Zhu, Xinghua Qu
https://arxiv.org/abs/2508.12645
SPADE: A Large Language Model Framework for Soil Moisture Pattern Recognition and Anomaly Detection in Precision Agriculture
Yeonju Lee, Rui Qi Chen, Joseph Oboamah, Po Nien Su, Wei-zhen Liang, Yeyin Shi, Lu Gan, Yongsheng Chen, Xin Qiao, Jing Li
https://arxiv.org/abs/2509.18123
Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning
Dongyang Guo, Yasmeen Abdrabou, Enkeleda Thaqi, Enkelejda Kasneci
https://arxiv.org/abs/2507.18252
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/7]:
- AutoDrive-QA: A Multiple-Choice Benchmark for Vision-Language Evaluation in Urban Autonomous Driving
Boshra Khalili, Andrew W. Smyth
Beyond Surface-Level Detection: Towards Cognitive-Driven Defense Against Jailbreak Attacks via Meta-Operations Reasoning
Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang
https://arxiv.org/abs/2508.03054
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/1]:
- Sample-efficient Integration of New Modalities into Large Language Models
Osman Batur \.Ince, Andr\'e F. T. Martins, Oisin Mac Aodha, Edoardo M. Ponti
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/5]:
- CountingFruit: Language-Guided 3D Fruit Counting with Semantic Gaussian Splatting
Fengze Li, Yangle Liu, Jieming Ma, Hai-Ning Liang, Yaochun Shen, Huangxiang Li, Zhijing Wu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/5]:
- RS-OOD: A Vision-Language Augmented Framework for Out-of-Distribution Detection in Remote Sensing
Chenhao Wang, Yingrui Ji, Yu Meng, Yunjian Zhang, Yao Zhu
STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning
Chenghao Wu, Ruiyang Ren, Junjie Zhang, Ruirui Wang, Zhongrui Ma, Qi Ye, Wayne Xin Zhao
https://arxiv.org/abs/2508.18812
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/5]:
- HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy
Myungkyu Koo, Daewon Choi, Taeyoung Kim, Kyungmin Lee, Changyeon Kim, Younggyo Seo, Jinwoo Shin
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[6/10]:
- CountingFruit: Language-Guided 3D Fruit Counting with Semantic Gaussian Splatting
Fengze Li, Yangle Liu, Jieming Ma, Hai-Ning Liang, Yaochun Shen, Huangxiang Li, Zhijing Wu
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/3]:
- MultiStream-LLM: Bridging Modalities for Robust Sign Language Translation
Marshall Thomas, Edward Fish, Richard Bowden
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/9]:
- DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse ...
Wang, Zhang, Fang, Tian, Yang, Ma, Pan, Song, Yu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/7]:
- LFTR: Learning-Free Token Reduction for Multimodal Large Language Models
Zihui Zhao, Yingxin Li, Yang Li
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/7]:
- CoFFT: Chain of Foresight-Focus Thought for Visual Language Models
Zhang, Dong, Zhang, Jia, Dang, Fernando, Liu, Shou
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/9]:
- "Principal Components" Enable A New Language of Images
Xin Wen, Bingchen Zhao, Ismail Elezi, Jiankang Deng, Xiaojuan Qi
Language-Instructed Reasoning for Group Activity Detection via Multimodal Large Language Model
Jihua Peng, Qianxiong Xu, Yichen Liu, Chenxi Liu, Cheng Long, Rui Zhao, Ziyue Li
https://arxiv.org/abs/2509.16054
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[1/2]:
- SGAligner : Cross-Modal Language-Aided 3D Scene Graph Alignment
Binod Singh, Sayan Deb Sarkar, Iro Armeni
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/5]:
- VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for A...
Younggun Kim, Ahmed S. Abdelrahman, Mohamed Abdel-Aty
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/2]:
- CaTS-Bench: Can Language Models Describe Numeric Time Series?
Luca Zhou, Pratham Yashwante, Marshall Fisher, Alessio Sampieri, Zihao Zhou, Fabio Galasso, Rose Yu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/6]:
- Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition
Zefeng Qian, Xincheng Yao, Yifei Huang, Chongyang Zhang, Jiangyong Ying, Hong Sun
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/5]:
- Investigating Traffic Accident Detection Using Multimodal Large Language Models
Ilhan Skender, Kailin Tong, Selim Solmaz, Daniel Watzenig
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/8]:
- SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlea...
Chen, Deng, Zheng, Yan, Liu, Wu, Jiang, Liu, Hu
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[8/8]:
- The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via ...
Jiang, Jiang, Ma, Wen, Li, Zhan, Jia, Liu, Sun, Lang
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/5]:
- RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
Difei Gu, Yunhe Gao, Yang Zhou, Mu Zhou, Dimitris Metaxas
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[3/7]:
- Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Spee...
Jeong Hun Yeo, Minsu Kim, Chae Won Kim, Stavros Petridis, Yong Man Ro
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/7]:
- SLGaussian: Fast Language Gaussian Splatting in Sparse Views
Kangjie Chen, BingQuan Dai, Minghan Qin, Dongbin Zhang, Peihao Li, Yingshuang Zou, Haoqian Wang
Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[5/7]:
- Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distrib...
Behraj Khan, Tahir Qasim Syed, Nouman M. Durrani, Bilal Naseem, Shabir Ahmad, Rizwan Qureshi
…