Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csCL_bot@mastoxiv.page
2026-03-31 11:13:08

Replaced article(s) found for cs.CL. arxiv.org/list/cs.CL/new
[5/5]:
- AppellateGen: A Benchmark for Appellate Legal Judgment Generation
Yang, Wang, Fan, Hu, Wang, Liu, Zeng, Fu, Gong, Zhang, Li, Zheng, Xu
arxiv.org/abs/2601.01331 mastoxiv.page/@arXiv_csCY_bot/
- Vision-Language Agents for Interactive Forest Change Analysis
James Brock, Ce Zhang, Nantheera Anantrasirichai
arxiv.org/abs/2601.04497 mastoxiv.page/@arXiv_csCV_bot/
- FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures
Jifeng Song, Arun Das, Pan Wang, Hui Ji, Kun Zhao, Yufei Huang
arxiv.org/abs/2601.08026 mastoxiv.page/@arXiv_csCV_bot/
- Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts
Luo, Zhang, Hu, Zhang, Wang, Su, Sun, Liang, Zhang
arxiv.org/abs/2601.10079 mastoxiv.page/@arXiv_csLG_bot/
- Compounding Disadvantage: Auditing Intersectional Bias in LLM-Generated Explanations Across India...
Amogh Gupta (Neil), Niharika Patil (Neil), Sourojit Ghosh (Neil), SnehalKumar (Neil), S Gaikwad
arxiv.org/abs/2601.14506 mastoxiv.page/@arXiv_csCY_bot/
- Measuring Complexity at the Requirements Stage: Spectral Metrics as Development Effort Predictors
Vierlboeck, Pugliese, Nilchian, Grogan, Babu
arxiv.org/abs/2602.07182 mastoxiv.page/@arXiv_csSE_bot/
- CoPE-VideoLM: Leveraging Codec Primitives For Efficient Video Language Modeling
Sarkar, Pautrat, Miksik, Pollefeys, Armeni, Rad, Dusmanu
arxiv.org/abs/2602.13191 mastoxiv.page/@arXiv_csCV_bot/
- MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Pref...
Ashutosh Chaubey, Jiacheng Pang, Mohammad Soleymani
arxiv.org/abs/2603.03192 mastoxiv.page/@arXiv_csCV_bot/
- Image Generation Models: A Technical History
Rouzbeh Shirvani
arxiv.org/abs/2603.07455 mastoxiv.page/@arXiv_csCV_bot/
- Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers
Shubham Aggarwal, Lokendra Kumar
arxiv.org/abs/2603.08343 mastoxiv.page/@arXiv_csLG_bot/
- FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning
Chaojie Sun, Bin Cao, Tiantian Li, Chenyu Hou, Ruizhe Li, Jing Fan
arxiv.org/abs/2603.12702 mastoxiv.page/@arXiv_csIR_bot/
- CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad
Yongqiang Chen, Chenxi Liu, Zhenhao Chen, Tongliang Liu, Bo Han, Kun Zhang
arxiv.org/abs/2603.14575 mastoxiv.page/@arXiv_csLG_bot/
- Silicon Bureaucracy and AI Test-Oriented Education: Contamination Sensitivity and Score Confidenc...
Yiliang Song, Hongjun An, Jiangan Chen, Xuanchen Yan, Huan Song, Jiawei Shao, Xuelong Li
arxiv.org/abs/2603.21636 mastoxiv.page/@arXiv_csAI_bot/
- Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits
Eric Czech, Zhiwei Xu, Yael Elmatad, Yixin Wang, William Held
arxiv.org/abs/2603.22339 mastoxiv.page/@arXiv_csLG_bot/
- X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs
Di Cao, Dongjie Fu, Hai Yu, Siqi Zheng, Xu Tan, Tao Jin
arxiv.org/abs/2603.24596 mastoxiv.page/@arXiv_eessAS_bo
toXiv_bot_toot

@arXiv_csLG_bot@mastoxiv.page
2026-02-25 10:45:31

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
Yining Hong, Huang Huang, Manling Li, Li Fei-Fei, Jiajun Wu, Yejin Choi
arxiv.org/abs/2602.21198 arxiv.org/pdf/2602.21198 arxiv.org/html/2602.21198
arXiv:2602.21198v1 Announce Type: new
Abstract: Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with ablative studies validating the complementary roles of reflection-in-action and reflection-on-action. Qualitative analyses, including real-robot trials, highlight behavioral correction through reflection.
toXiv_bot_toot