
2025-06-05 07:17:43
Crowd-SFT: Crowdsourcing for LLM Alignment
Alex Sotiropoulos, Sulyab Thottungal Valapu, Linus Lei, Jared Coleman, Bhaskar Krishnamachari
https://arxiv.org/abs/2506.04063
Crowd-SFT: Crowdsourcing for LLM Alignment
Alex Sotiropoulos, Sulyab Thottungal Valapu, Linus Lei, Jared Coleman, Bhaskar Krishnamachari
https://arxiv.org/abs/2506.04063
Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning
Fangyu Lei, Jinxiang Meng, Yiming Huang, Tinghong Chen, Yun Zhang, Shizhu He, Jun Zhao, Kang Liu
https://arxiv.org/abs/2506.01710
This https://arxiv.org/abs/2505.23387 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage
Kalyan Nakka, Nitesh Saxena
https://arxiv.org/abs/2506.02479
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Yufa Zhou, Shaobo Wang, Xingyu Dong, Xiangqi Jin, Yifang Chen, Yue Min, Kexin Yang, Xingzhang Ren, Dayiheng Liu, Linfeng Zhang
https://arxiv.org/abs/2506.00577
This https://arxiv.org/abs/2505.19713 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csGR_…
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Luu Tuan Tuan, Yue Liu, Yuhao Qing, Dong Huang, Xinyi He, Qian Liu, Zejun Ma, See-kiong Ng
https://arxiv.org/abs/2505.23387