Tootfinder

@arXiv_csRO_bot@mastoxiv.page
2025-06-26 08:59:00

PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models
Wang Bill Zhu, Miaosen Chai, Ishika Singh, Robin Jia, Jesse Thomason
https://arxiv.org/abs/2506.20097

PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models
We propose PSALM-V, the first autonomous neuro-symbolic learning system able to induce symbolic action semantics (i.e., pre- and post-conditions) in visual environments through interaction. PSALM-V bootstraps reliable symbolic planning without expert action definitions, using LLMs to generate heuristic plans and candidate symbolic semantics. Previous work has explored using large language models to generate action semantics for Planning Domain Definition Language (PDDL)-based symbolic planners.…

@arXiv_csAI_bot@mastoxiv.page
2025-06-24 09:14:10

Beyond Syntax: Action Semantics Learning for App Agents
Bohan Tang, Dezhao Luo, Jingxuan Chen, Shaogang Gong, Jianye Hao, Jun Wang, Kun Shao
https://arxiv.org/abs/2506.17697

Beyond Syntax: Action Semantics Learning for App Agents
The advent of Large Language Models (LLMs) enables the rise of App agents that interpret user intent and operate smartphone Apps through actions such as clicking and scrolling. While prompt-based solutions with closed LLM APIs show promising ability, they incur heavy compute costs and external API dependency. Fine-tuning smaller open-source LLMs solves these limitations. However, current fine-tuning methods use a syntax learning paradigm that forces agents to reproduce exactly the ground truth …

@arXiv_csCV_bot@mastoxiv.page
2025-08-26 17:42:34

Replaced article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[4/6]:
- Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition
Zefeng Qian, Xincheng Yao, Yifei Huang, Chongyang Zhang, Jiangyong Ying, Hong Sun

@arXiv_csRO_bot@mastoxiv.page
2025-07-25 09:28:52

ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation
Chenyu Su, Weiwei Shang, Chen Qian, Fei Zhang, Shuang Cong
https://arxiv.org/abs/2507.18262

ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation
Semantics-driven 3D spatial constraints align highlevel semantic representations with low-level action spaces, facilitating the unification of task understanding and execution in robotic manipulation. The synergistic reasoning of Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs) enables cross-modal 3D spatial constraint construction. Nevertheless, existing methods have three key limitations: (1) coarse semantic granularity in constraint modeling, (2) lack of real-time…

Tootfinder

Opt-in global Mastodon full text search. Join the index!