Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@arXiv_csLG_bot@mastoxiv.page
2025-07-24 09:06:59

Hierarchical Reinforcement Learning Framework for Adaptive Walking Control Using General Value Functions of Lower-Limb Sensor Signals
Sonny T. Jones, Grange M. Simpson, Patrick M. Pilarski, Ashley N. Dalrymple
arxiv.org/abs/2507.16983

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 09:40:40

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling
Zengzhi Wang, Fan Zhou, Xuefeng Li, Pengfei Liu
arxiv.org/abs/2506.20512

@arXiv_statML_bot@mastoxiv.page
2025-06-26 08:38:40

A Principled Path to Fitted Distributional Evaluation
Sungee Hong, Jiayi Wang, Zhengling Qi, Raymond Ka Wai Wong
arxiv.org/abs/2506.20048

@arXiv_nuclex_bot@mastoxiv.page
2025-05-27 07:46:41

Study of $p_\mathrm{T}$-differential radial flow in blast-wave model
Swati Saha, Ranbir Singh, Bedangadas Mohanty
arxiv.org/abs/2505.19697

@arXiv_hepth_bot@mastoxiv.page
2025-07-23 09:25:42

On the One-Loop Exactness of Gravity Partition Function
Andres Goya, Mauricio Leston, Mario Passaglia
arxiv.org/abs/2507.16141

@dcm@social.sunet.se
2025-06-05 14:23:15

Another of my forays into AI ethics is just out! This time the focus is on the ethics (or lack thereof) of Reinforcement Learning Feedback (RLF) techniques aimed at increasing the 'alignment' of LLMs.
The paper is fruit of the joint work of a great team of collaborators, among whom @… and @…

@arXiv_csRO_bot@mastoxiv.page
2025-06-24 11:57:40

Robots and Children that Learn Together : Improving Knowledge Retention by Teaching Peer-Like Interactive Robots
Imene Tarakli, Samuele Vinanzi, Richard Moore, Alessandro Di Nuovo
arxiv.org/abs/2506.18365

@arXiv_eessSY_bot@mastoxiv.page
2025-07-25 09:23:42

Towards Microgrid Resilience Enhancement via Mobile Power Sources and Repair Crews: A Multi-Agent Reinforcement Learning Approach
Yi Wang, Dawei Qiu, Fei Teng, Goran Strbac
arxiv.org/abs/2507.18095

@arXiv_csCL_bot@mastoxiv.page
2025-07-25 10:15:22

Checklists Are Better Than Reward Models For Aligning Language Models
Vijay Viswanathan, Yanchao Sun, Shuang Ma, Xiang Kong, Meng Cao, Graham Neubig, Tongshuang Wu
arxiv.org/abs/2507.18624

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 08:41:30

AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length Control
Ruosen Li, Ziming Luo, Quan Zhang, Ruochen Li, Ben Zhou, Ali Payani, Xinya Du
arxiv.org/abs/2506.20160