Vision Language Action Models in Robotic Manipulation: A Systematic ReviewMuhayy Ud Din, Waseem Akram, Lyes Saad Saoud, Jan Rosell, Irfan Hussainhttps://arxiv.org/abs/2507.10672
Vision Language Action Models in Robotic Manipulation: A Systematic ReviewVision Language Action (VLA) models represent a transformative shift in robotics, with the aim of unifying visual perception, natural language understanding, and embodied control within a single learning framework. This review presents a comprehensive and forward-looking synthesis of the VLA paradigm, with a particular emphasis on robotic manipulation and instruction-driven autonomy. We comprehensively analyze 102 VLA models, 26 foundational datasets, and 12 simulation platforms that collective…