VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement LearningLi Kang, Xiufeng Song, Heng Zhou, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yinhttps://arxiv.org/abs/2506.09049
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement LearningCoordinating multiple embodied agents in dynamic environments remains a core challenge in artificial intelligence, requiring both perception-driven reasoning and scalable cooperation strategies. While recent works have leveraged large language models (LLMs) for multi-agent planning, a few have begun to explore vision-language models (VLMs) for visual reasoning. However, these VLM-based approaches remain limited in their support for diverse embodiment types. In this work, we introduce VIKI-Bench…