Tootfinder

No exact results. Similar results found.

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:53:40

Reinforcement Learning with Rubric Anchors
Zenan Huang, Yihong Zhuang, Guoshan Lu, Zeyu Qin, Haokai Xu, Tianyu Zhao, Ru Peng, Jiaqi Hu, Zhanming Shen, Xiaomeng Hu, Xijun Gu, Peiyi Tu, Jiaxin Liu, Wenyu Chen, Yuzhuo Fu, Zhiting Fan, Yanmei Gu, Yuanyuan Wang, Zhengkai Yang, Jianguo Li, Junbo Zhao
https://arxiv.org/abs/2508.12790

Reinforcement Learning with Rubric Anchors
Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing Large Language Models (LLMs), exemplified by the success of OpenAI's o-series. In RLVR, rewards are derived from verifiable signals-such as passing unit tests in code generation or matching correct answers in mathematical reasoning. While effective, this requirement largely confines RLVR to domains with automatically checkable outcomes. To overcome this, we extend the RLVR paradigm to open-end…

@brian_gettler@mas.to
2025-07-18 14:23:14

When someone asks you to do something that's _part of your effin' job_, answer ASAP. Say 'yes' or say 'no' (if you can), but don't let the request sit for days or weeks. The time you spend ignoring the request might cause problems for the human(s) involved.

@arXiv_csCL_bot@mastoxiv.page
2025-08-20 08:18:59

ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs
Hongxin Ding, Baixiang Huang, Yue Fang, Weibin Liao, Xinke Jiang, Zheng Li, Junfeng Zhao, Yasha Wang
https://arxiv.org/abs/2508.13514

ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs
Interactive medical questioning is essential in real-world clinical consultations, where physicians must actively gather information from patients. While medical Large Language Models (LLMs) have shown impressive capabilities in static medical question answering, they predominantly operate under a reactive paradigm: generating answers directly without seeking additional information, which risks incorrect diagnoses in such interactive settings. To address this limitation, we propose ProMed, a re…

@arXiv_csCV_bot@mastoxiv.page
2025-08-19 12:05:10

Breaking Reward Collapse: Adaptive Reinforcement for Open-ended Medical Reasoning with Enhanced Semantic Discrimination
Yizhou Liu, Jingwei Wei, Zizhi Chen, Minghao Han, Xukun Zhang, Keliang Liu, Lihua Zhang
https://arxiv.org/abs/2508.12957

Breaking Reward Collapse: Adaptive Reinforcement for Open-ended Medical Reasoning with Enhanced Semantic Discrimination
Reinforcement learning (RL) with rule-based rewards has demonstrated strong potential in enhancing the reasoning and generalization capabilities of vision-language models (VLMs) and large language models (LLMs), while reducing computational overhead. However, its application in medical imaging remains underexplored. Existing reinforcement fine-tuning (RFT) approaches in this domain primarily target closed-ended visual question answering (VQA), limiting their applicability to real-world clinical…

@thek3nger@mastodon.social
2025-08-19 16:03:18

Okay. This is my current top candidate for the most absurd, cool, and fascinating experiment/article of 2025.
Can you use candles as a clock signal for a CPU? Surprisingly, the answer is yes.
https://cpldcpu.com/2025/08/13/candle-flame-oscillations-as-a-clock/

Candle Flame Oscillations as a Clock
Todays candles have been optimized for millenia not to flicker. But it turns out when we bundle three of them together, we can undo all of these optimizations and the resulting triplet will start t…

@arXiv_csAI_bot@mastoxiv.page
2025-08-20 09:47:40

LM Agents May Fail to Act on Their Own Risk Knowledge
Yuzhi Tang, Tianxiao Li, Elizabeth Li, Chris J. Maddison, Honghua Dong, Yangjun Ruan
https://arxiv.org/abs/2508.13465 https…

LM Agents May Fail to Act on Their Own Risk Knowledge
Language model (LM) agents have demonstrated significant potential for automating real-world tasks, yet they pose a diverse array of potential, severe risks in safety-critical scenarios. In this work, we identify a significant gap between LM agents' risk awareness and safety execution abilities: while they often answer "Yes" to queries like "Is executing `sudo rm -rf /*' dangerous?", they will likely fail to identify such risks in instantiated trajectories or even directly perform these risky a…

@arXiv_csDS_bot@mastoxiv.page
2025-09-19 08:18:11

Minimum Sum Coloring with Bundles in Trees and Bipartite Graphs
Takehiro Ito, Naonori Kakimura, Naoyuki Kamiyama, Yusuke Kobayashi, Yoshio Okamoto
https://arxiv.org/abs/2509.15080

Minimum Sum Coloring with Bundles in Trees and Bipartite Graphs
The minimum sum coloring problem with bundles was introduced by Darbouy and Friggstad (SWAT 2024) as a common generalization of the minimum coloring problem and the minimum sum coloring problem. During their presentation, the following open problem was raised: whether the minimum sum coloring problem with bundles could be solved in polynomial time for trees. We answer their question in the negative by proving that the minimum sum coloring problem with bundles is NP-hard even for paths. We compl…

@arXiv_csRO_bot@mastoxiv.page
2025-08-18 08:01:30

Robust Online Calibration for UWB-Aided Visual-Inertial Navigation with Bias Correction
Yizhi Zhou, Jie Xu, Jiawei Xia, Zechen Hu, Weizi Li, Xuan Wang
https://arxiv.org/abs/2508.10999

Robust Online Calibration for UWB-Aided Visual-Inertial Navigation with Bias Correction
This paper presents a novel robust online calibration framework for Ultra-Wideband (UWB) anchors in UWB-aided Visual-Inertial Navigation Systems (VINS). Accurate anchor positioning, a process known as calibration, is crucial for integrating UWB ranging measurements into state estimation. While several prior works have demonstrated satisfactory results by using robot-aided systems to autonomously calibrate UWB systems, there are still some limitations: 1) these approaches assume accurate robot l…

@hanno@mastodon.social
2025-07-17 12:22:47

You may've seen @… 's post about mails he gets about CRA compliance from large companies asking him to answer all kinds of questions. I just saw a similar one to the maintainer of another important FOSS library. The kicker: The company uses a version from 2000. (Yes, no typo, 25 years old. I think it has some unfixed vulnerabilities.)

@arXiv_csAI_bot@mastoxiv.page
2025-08-19 10:19:50

Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback
Wenzhen Yuan, Shengji Tang, Weihao Lin, Jiacheng Ruan, Ganqu Cui, Bo Zhang, Tao Chen, Ting Liu, Yuzhuo Fu, Peng Ye, Lei Bai
https://arxiv.org/abs/2508.12338

Wisdom of the Crowd: Reinforcement Learning from Coevolutionary Collective Feedback
Reinforcement learning (RL) has significantly enhanced the reasoning capabilities of large language models (LLMs), but its reliance on expensive human-labeled data or complex reward models severely limits scalability. While existing self-feedback methods aim to address this problem, they are constrained by the capabilities of a single model, which can lead to overconfidence in incorrect answers, reward hacking, and even training collapse. To this end, we propose Reinforcement Learning from Coev…

Tootfinder

Opt-in global Mastodon full text search. Join the index!