mimic-one: a Scalable Model Recipe for General Purpose Robot Dexterity
Elvis Nava, Victoriano Montesinos, Erik Bauer, Benedek Forrai, Jonas Pai, Stefan Weirich, Stephan-Daniel Gravert, Philipp Wand, Stephan Polinski, Benjamin F. Grewe, Robert K. Katzschmann
https://arxiv.org/abs/2506.11916…
Centralized Copy-Paste: Enhanced Data Augmentation Strategy for Wildland Fire Semantic Segmentation
Joon Tai Kim, Tianle Chen, Ziyu Dong, Nishanth Kunchala, Alexander Guller, Daniel Ospina Acero, Roger Williams, Mrinal Kumar
https://arxiv.org/abs/2507.06321
crossMoDA Challenge: Evolution of Cross-Modality Domain Adaptation Techniques for Vestibular Schwannoma and Cochlea Segmentation from 2021 to 2023
Navodini Wijethilake, Reuben Dorent, Marina Ivory, Aaron Kujawa, Stefan Cornelissen, Patrick Langenhuizen, Mohamed Okasha, Anna Oviedova, Hexin Dong, Bogyeong Kang, Guillaume Sall\'e, Luyi Han, Ziyuan Zhao, Han Liu, Tao Yang, Shahad Hardan, Hussain Alasmawi, Santosh Sanjeev, Yuzhou Zhuang, Satoshi Kondo, Maria Baldeon Calisto, Shaikh Muh…
Last year, we bought a collection of 17 Georgia paintings by 19th- and 20th-century artists, many of whom are lesser known. Even the ones who are better known, like #NellChoateJones, aren't exactly _well_ known. We're excited to start studying these works and learning about the Georgia scenes many of them show.
Machine Learning for Evolutionary Graph Theory
Guoli Yang, Matteo Cavaliere, Mingtao Zhang, Giovanni Masala, Adam Miles, Mengzhu Wang
https://arxiv.org/abs/2507.08363
Correcting for Position Bias in Learning to Rank: A Control Function Approach
Md Aminul Islam, Kathryn Vasilaky, Elena Zheleva
https://arxiv.org/abs/2506.06989
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Simon Matrenok, Skander Moalla, Caglar Gulcehre
https://arxiv.org/abs/2507.08068 https://arxiv.org/pdf/2507.08068 https://arxiv.org/html/2507.08068
arXiv:2507.08068v1 Announce Type: new
Abstract: Aligning large language models with pointwise absolute rewards has so far required online, on-policy algorithms such as PPO and GRPO. In contrast, simpler methods that can leverage offline or off-policy data, such as DPO and REBEL, are limited to learning from preference pairs or relative signals. To bridge this gap, we introduce \emph{Quantile Reward Policy Optimization} (QRPO), which learns from pointwise absolute rewards while preserving the simplicity and offline applicability of DPO-like methods. QRPO uses quantile rewards to enable regression to the closed-form solution of the KL-regularized RL objective. This reward yields an analytically tractable partition function, removing the need for relative signals to cancel this term. Moreover, QRPO scales with increased compute to estimate quantile rewards, opening a new dimension for pre-computation scaling. Empirically, QRPO consistently achieves top performance on chat and coding evaluations -- reward model scores, AlpacaEval 2, and LeetCode -- compared to DPO, REBEL, and SimPO across diverse datasets and 8B-scale models. Finally, we find that training with robust rewards instead of converting them to preferences induces less length bias.
toXiv_bot_toot
Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model
Koki Yamane, Yunhan Li, Masashi Konosu, Koki Inami, Junji Oaki, Sho Sakaino, Toshiaki Tsuji
https://arxiv.org/abs/2507.06174
Physics-Guided Dual Implicit Neural Representations for Source Separation
Yuan Ni, Zhantao Chen, Alexander N. Petsch, Edmund Xu, Cheng Peng, Alexander I. Kolesnikov, Sugata Chowdhury, Arun Bansil, Jana B. Thayer, Joshua J. Turner
https://arxiv.org/abs/2507.05249
Learning to Explore: An In-Context Learning Approach for Pure Exploration
Alessio Russo, Ryan Welch, Aldo Pacchiano
https://arxiv.org/abs/2506.01876 https:…