Multimodal Policy Internalization for Conversational Agents
Zhenhailong Wang, Jiateng Liu, Amin Fazel, Ritesh Sarkhel, Xing Fan, Xiang Li, Chenlei Guo, Heng Ji, Ruhi Sarikaya
https://arxiv.org/abs/2510.09474
SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing
Hongyi Jing, Jiafu Chen, Chen Rao, Ziqiang Dang, Jiajie Teng, Tianyi Chu, Juncheng Mo, Shuo Fang, Huaizhong Lin, Rui Lv, Chenguang Ma, Lei Zhao
https://arxiv.org/abs/2509.04908
STARE: Predicting Decision Making Based on Spatio-Temporal Eye Movements
Moshe Unger, Alexander Tuzhilin, Michel Wedel
https://arxiv.org/abs/2508.04148 https://