Tootfinder

Opt-in global Mastodon full text search. Join the index!

No exact results. Similar results found.
@NFL@darktundra.xyz
2025-10-24 12:49:30

Bills getting defensive reinforcements in hopes to avoid 3-game slide espn.com/nfl/story/_/id/467005

@arXiv_csAI_bot@mastoxiv.page
2025-09-25 07:30:42

Evaluation-Aware Reinforcement Learning
Shripad Vilasrao Deshmukh, Will Schwarzer, Scott Niekum
arxiv.org/abs/2509.19464 arxiv.org/pdf/2509…

@arXiv_csLG_bot@mastoxiv.page
2025-09-25 10:49:12

Failure Modes of Maximum Entropy RLHF
\"Omer Veysel \c{C}a\u{g}atan, Bar{\i}\c{s} Akg\"un
arxiv.org/abs/2509.20265 arxiv.org/pdf/…

@arXiv_csCL_bot@mastoxiv.page
2025-09-25 10:44:52

Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
Chaojun Nie, Jun Zhou, Guanxiang Wang, Shisong Wud, Zichen Wang
arxiv.org/abs/2509.20162

@arXiv_csAI_bot@mastoxiv.page
2025-09-25 09:13:32

From Pheromones to Policies: Reinforcement Learning for Engineered Biological Swarms
Aymeric Vellinger, Nemanja Antonic, Elio Tuci
arxiv.org/abs/2509.20095

@arXiv_csRO_bot@mastoxiv.page
2025-09-25 10:20:52

MARG: MAstering Risky Gap Terrains for Legged Robots with Elevation Mapping
Yinzhao Dong, Ji Ma, Liu Zhao, Wanyue Li, Peng Lu
arxiv.org/abs/2509.20036

@arXiv_csLG_bot@mastoxiv.page
2025-09-25 10:39:02

Learning Robust Penetration-Testing Policies under Partial Observability: A systematic evaluation
Raphael Simon, Pieter Libin, Wim Mees
arxiv.org/abs/2509.20008

@arXiv_csRO_bot@mastoxiv.page
2025-10-15 09:49:31

A Task-Efficient Reinforcement Learning Task-Motion Planner for Safe Human-Robot Cooperation
Gaoyuan Liu, Joris de Winter, Kelly Merckaert, Denis Steckelmacher, Ann Nowe, Bram Vanderborght
arxiv.org/abs/2510.12477

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:11:59

Reinforcing Diffusion Models by Direct Group Preference Optimization
Yihong Luo, Tianyang Hu, Jing Tang
arxiv.org/abs/2510.08425 arxiv.org/…

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:38:08

Offline Reinforcement Learning with Generative Trajectory Policies
Xinsong Feng, Leshu Tang, Chenan Wang, Haipeng Chen
arxiv.org/abs/2510.11499