
2025-08-11 07:42:19
Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning
Jia Fu, Xinyu Yang, Hongzhi Zhang, Yahui Liu, Jingyuan Zhang, Qi Wang, Fuzheng Zhang, Guorui Zhou
https://arxiv.org/abs/2508.05710
Klear-CodeTest: Scalable Test Case Generation for Code Reinforcement Learning
Jia Fu, Xinyu Yang, Hongzhi Zhang, Yahui Liu, Jingyuan Zhang, Qi Wang, Fuzheng Zhang, Guorui Zhou
https://arxiv.org/abs/2508.05710
Active Membership Inference Test (aMINT): Enhancing Model Auditability with Multi-Task Learning
Daniel DeAlcala, Aythami Morales, Julian Fierrez, Gonzalo Mancera, Ruben Tolosana, Javier Ortega-Garcia
https://arxiv.org/abs/2509.07879
Sequential Test for Practical Significance: Truncated Mixture Sequential Probability Ratio Test
Kyu Min Shim
https://arxiv.org/abs/2509.07892 https://arxiv…
ADPro: a Test-time Adaptive Diffusion Policy for Robot Manipulation via Manifold and Initial Noise Constraints
Zezeng Li, Rui Yang, Ruochen Chen, ZhongXuan Luo, Liming Chen
https://arxiv.org/abs/2508.06266
Sequentially Auditing Differential Privacy
Tom\'as Gonz\'alez, Mateo Dulce-Rubio, Aaditya Ramdas, M\'onica Ribero
https://arxiv.org/abs/2509.07055 https://
The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language Models
Konrad L\"ohr, Shuzhou Yuan, Michael F\"arber
https://arxiv.org/abs/2510.08236
Basis Vector Metric: A Method for Robust Open-Ended State Change Detection
David Oprea, Sam Powers
https://arxiv.org/abs/2509.07308 https://arxiv.org/pdf/2…
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Yi Lu, Jianing Wang, Linsen Guo, Wei He, Hongyin Tang, Tao Gui, Xuanjing Huang, Xuezhi Cao, Wei Wang, Xunliang Cai
https://arxiv.org/abs/2510.08189
PAC Learnability in the Presence of Performativity
Ivan Kirev, Lyuben Baltadzhiev, Nikola Konstantinov
https://arxiv.org/abs/2510.08335 https://arxiv.org/p…
Effects of magnetic fields on spinning test particles orbiting Kerr-Bertotti-Robinson black holes
Yu-Kun Zhang, Shao-Wen Wei
https://arxiv.org/abs/2510.07914 https://
Guess your neighbor's input: Quantum advantage in Feige's game
Simon Schmidt, Sigurd A. L. Storgaard, Michael Walter, Yuming Zhao
https://arxiv.org/abs/2510.08484 https:…
A Systematic Framework to Test the Resilience of Three-Fold Redundant Sparse Arrays Against Two Sensor Failures and Some Never-Before Findings
Ashish Patwari, Andr\'es Alay\'on Glazunov
https://arxiv.org/abs/2509.07442
A Baseline $T\log^2 T$ Upper Bound for KL-Regularized Prime--Zero Optimal Transport
Zhejun Yang (University of Sydney)
https://arxiv.org/abs/2509.07329 https://
Probing Dark Matter Interactions with Stellar Motion near Sagittarius A*
R. Andrew Gustafson, Ian M. Shoemaker, Volodymyr Takhistov
https://arxiv.org/abs/2510.07387 https://
Rethinking the Sioux Falls Network: Insights from Path-Driven Higher-Order Network Analysis
Chen Zhang, Timothy LaRock, Alben Rome Bagabaldo, J\"urgen Hackl
https://arxiv.org/abs/2508.06234
Execution-Feedback Driven Test Generation from SWE Issues
Toufique Ahmed, Jatin Ganhotra, Avraham Shinnar, Martin Hirzel
https://arxiv.org/abs/2508.06365 https://
Probing the Cosmic Distance Duality Relation via Non-Parametric Reconstruction for High Redshifts
Felipe Avila, Fernanda Oliveira, Camila Franco, Maria Lopes, Rodrigo Holanda, Rafael C. Nunes, Armando Bernui
https://arxiv.org/abs/2509.07848
Particle dynamics in TOI-178 planetary system
J. Boskovic, R. Sfair, C. M. Sch\"afer
https://arxiv.org/abs/2509.07930 https://arxiv.org/pdf/2509.07930…
UTM Performance Under Stressing Scenarios
Ian Jessen
https://arxiv.org/abs/2509.08124 https://arxiv.org/pdf/2509.08124
Beyond Distribution Shifts: Adaptive Hyperspectral Image Classification at Test Time
Xia Yue, Anfeng Liu, Ning Chen, Chenjia Huang, Hui Liu, Zhou Huang, Leyuan Fang
https://arxiv.org/abs/2509.08436
Development and performance test of p-type Silicon pad array detector
Sawan, G. Tambave, S. Das, A. Chaudhry, R. Gupta, V. K. S. Kashyap, B. Mohanty, M. M. Mondal, S. Mathur, A. Puri, K. P. Sharma, R. Sharma, R. Singh
https://arxiv.org/abs/2508.06100
Validity Verification of the New TOEFL Writing Task Based on Classical Test Theory
Yinyu Zhang
https://arxiv.org/abs/2509.05347 https://arxiv.org/pdf/2509.…
Artificial Intelligence as an Opportunity for the Science of Consciousness: A Dual-Resolution Framework
Shahar Dror, Dafna Bergerbest, Moti Salti
https://arxiv.org/abs/2509.07001
A Liouville theorem for the $2$-Hessian equation on the Heisenberg group
Wei Zhang, Qi Zhou
https://arxiv.org/abs/2509.08415 https://arxiv.org/pdf/2509.084…
An on-sky investigation into factors limiting the performance of Keck-NIRC2 for conducting infrared high-contrast imaging
Rachel Bowens-Rubin, Ma\"issa Salama, Jayke S. Nguyen, William Thompson, Philip Hinz
https://arxiv.org/abs/2509.07138
Is Liller 1 a building block of the Galactic bulge? - Evidence with APOGEE
Anna Liptrott, Ricardo P. Schiavon, Andrew C. Mason, Sebastian Kamann, Borja Anguiano, Roger E. Cohen, Jos\'e G. Fern\'andez-Trincado, Danny Horta, Steven R. Majewski, Dante Minniti, David M. Nataf, Michael J. W. O'Connor, Dominic Wearne
https://arxiv.or…
A two-axes shear cell for rheo-optics
Chiara Marraffa, Stefano Aime
https://arxiv.org/abs/2509.08114 https://arxiv.org/pdf/2509.08114
DogFit: Domain-guided Fine-tuning for Efficient Transfer Learning of Diffusion Models
Yara Bahram, Mohammadhadi Shateri, Eric Granger
https://arxiv.org/abs/2508.05685 https://…
Replaced article(s) found for math.ST. https://arxiv.org/list/math.ST/new
[1/1]:
- A Necessary and Sufficient Condition for Size Controllability of Heteroskedasticity Robust Test S...
Benedikt M. P\"otscher, David Preinerstorfer
K2-Think: A Parameter-Efficient Reasoning System
Zhoujun Cheng, Richard Fan, Shibo Hao, Taylor W. Killian, Haonan Li, Suqi Sun, Hector Ren, Alexander Moreno, Daqian Zhang, Tianjun Zhong, Yuxin Xiong, Yuanzhe Hu, Yutao Xie, Xudong Han, Yuqi Wang, Varad Pimpalkhute, Yonghao Zhuang, Aaryamonvikram Singh, Xuezhi Liang, Anze Xie, Jianshu She, Desai Fan, Chengqian Gao, Liqun Ma, Mikhail Yurochkin, John Maggs, Xuezhe Ma, Guowei He, Zhiting Hu, Zhengzhong Liu, Eric P. Xing
Probing the Preferences of a Language Model: Integrating Verbal and Behavioral Tests of AI Welfare
Valen Tagliabue, Leonard Dung
https://arxiv.org/abs/2509.07961 https://…
Automatic Detection of Inauthentic Templated Responses in English Language Assessments
Yashad Samant, Lee Becker, Scott Hellman, Bradley Behan, Sarah Hughes, Joshua Southerland
https://arxiv.org/abs/2509.08355
Automated Discovery of Test Oracles for Database Management Systems Using LLMs
Qiuyang Mang, Runyuan He, Suyang Zhong, Xiaoxuan Liu, Huanchen Zhang, Alvin Cheung
https://arxiv.org/abs/2510.06663
Scanned SQUID Microscope with High-speed Electrical Connectivity
Ian W. Haygood, Bochao Xu, John Biesecker, Michael L. Schneider
https://arxiv.org/abs/2509.07137 https://…
Detection of mean changes in partially observed functional data
\v{S}\'arka Hudecov\'a, Claudia Kirch
https://arxiv.org/abs/2510.07854 https://arxi…
Accelerating AI Development with Cyber Arenas
William Cashman, Chasen Milner, Michael Houle, Michael Jones, Hayden Jananthan, Jeremy Kepner, Peter Michaleas, Alex Pentland
https://arxiv.org/abs/2509.08200
Benchmarking quantum computers with any quantum algorithm
Stefan K. Seritan, Aditya Dhumuntarao, Aidan Q. Wilber-Gauthier, Kenneth M. Rudinger, Antonio E. Russo, Robin Blume-Kohout, Andrew D. Baczewski, Timothy Proctor
https://arxiv.org/abs/2508.05754
A Comprehensive Review of Reinforcement Learning for Autonomous Driving in the CARLA Simulator
Elahe Delavari, Feeza Khan Khanzada, Jaerock Kwon
https://arxiv.org/abs/2509.08221
DP-SPRT: Differentially Private Sequential Probability Ratio Tests
Thomas Michel, Debabrota Basu, Emilie Kaufmann
https://arxiv.org/abs/2508.06377 https://…
Time-Varying Volatility of Bank Betas
Matt Brigida
https://arxiv.org/abs/2510.07671 https://arxiv.org/pdf/2510.07671
New test of modified gravity with gravitational wave experiments
N. M. Jim\'enez Cruz, Flavio C. S\'anchez, Gianmassimo Tasinato
https://arxiv.org/abs/2509.08273 https:/…
Largevars: An R Package for Testing Large VARs for the Presence of Cointegration
Anna Bykhovskaya, Vadim Gorin, Eszter Kiss
https://arxiv.org/abs/2509.06295 https://
The $n^{th}$ centered moments of a large orthogonal family of automorphic $L$-functions
Vorrapan Chandee, Yoonbok Lee, Xiannan Li
https://arxiv.org/abs/2510.07647 https://
Enhancing Software Vulnerability Detection Through Adaptive Test Input Generation Using Genetic Algorithm
Yanusha Mehendran, Maolin Tang, Yi Lu
https://arxiv.org/abs/2508.05923 …
Scalar Quasinormal modes in Reissner--Nordstr\"om black holes: implications for Weak Gravity Conjecture
Giorgio Di Russo, Anna Tokareva
https://arxiv.org/abs/2510.06813 htt…
Revisiting the Question of Information Content of EXAFS Spectra through a Bayesian Approach
Lucy Haddad, Diego Gianolio, Andrei Sapelkin
https://arxiv.org/abs/2509.07950 https:/…
Neutrinoless double beta decay in a supersymmetric left-right model
Vivek Banerjee, Sasmita Mishra
https://arxiv.org/abs/2510.08090 https://arxiv.org/pdf/2…
Classification of 24-hour movement behaviors from wrist-worn accelerometer data: from handcrafted features to deep learning techniques
Alireza Sameh, Mehrdad Rostami, Mourad Oussalah, Vahid Farrahi
https://arxiv.org/abs/2509.08606
IRS-Assisted IoT Activity Detection Under Asynchronous Transmission and Heterogeneous Powers: Detectors and Performance Analysis
Amirhossein Taherpour, Somayeh Khani, Abbas Taherpour, Tamer Khattab
https://arxiv.org/abs/2508.05959
ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
Qin Liu, Jacob Dineen, Yuxi Huang, Sheng Zhang, Hoifung Poon, Ben Zhou, Muhao Chen
https://arxiv.org/abs/2510.08569
Detection of supernova magnitude fluctuations induced by large-scale structure
A. Nguyen, C. Blake, R. J. Turner, V. Aronica, J. Bautista, J. Aguilar, S. Ahlen, S. BenZvi, D. Bianchi, D. Brooks, A. Carr, T. Claybaugh, A. Cuceu, A. de la Macorra, B. Dey, P. Doel, K. Douglass, S. Ferraro, J. E. Forero-Romero, E. Gazta\~naga, S. Gontcho A Gontcho, G. Gutierrez, J. Guy, K. Honscheid, C. Howlett, D. Huterer, M. Ishak, R. Joyce, R. Kehoe, A. G. Kim, A. Kremin, O. Lahav, M. Landriau, L. Le Gu…
Can LLMs effectively provide game-theoretic-based scenarios for cybersecurity?
Daniele Proverbio, Alessio Buscemi, Alessandro Di Stefano, The Anh Han, German Castignani, Pietro Li\`o
https://arxiv.org/abs/2508.05670
Weakly-Driven Quantum Walks for Memory-Constrained Pauli Channel Learning
Yuan-Zhuo Wang, Yi-Ran Xiao, Ming-Yang Li, Shengjun Wu, Zeng-Bing Chen
https://arxiv.org/abs/2509.07702
Scalable Offline Metrics for Autonomous Driving
Animikh Aich, Adwait Kulkarni, Eshed Ohn-Bar
https://arxiv.org/abs/2510.08571 https://arxiv.org/pdf/2510.08…
A Honest Cross-Validation Estimator for Prediction Performance
Tianyu Pan, Vincent Z. Yu, Viswanath Devanarayan, Lu Tian
https://arxiv.org/abs/2510.07649 https://
Numerical effects on the stripping of dark matter and stars in IllustrisTNG galaxy groups and clusters
Mark R. Lovell (ICC Durham, Durham Physics, University of Iceland), Annalisa Pillepich (MPIA), Christoph Engler (MPIA), Dylan Nelson (Heidelberg), Rahul Ramesh (Heidelberg), Volker Springel (MPA), Lars Hernquist (ITP Harvard)
https://arxi…
Validity and Power of Heavy-Tailed Combination Tests under Asymptotic Dependence
Lin Gui, Tiantian Mao, Jingshu Wang, Ruodu Wang
https://arxiv.org/abs/2508.05818 https://…
Replaced article(s) found for physics.ins-det. https://arxiv.org/list/physics.ins-det/new
[1/1]:
- The CMS Phase-2 Fast Beam Condition Monitor prototype test with beam
G. Auzinger, et al.
GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
Wen Ye, Zhaocheng Liu, Yuwei Gui, Tingyu Yuan, Yunyue Su, Bowen Fang, Chaoyang Zhao, Qiang Liu, Liang Wang
https://arxiv.org/abs/2510.07217
Black Hole Spectroscopy and Tests of General Relativity with GW250114
The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration
https://arxiv.org/abs/2509.08099
Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR
Zeyu Sun, Jingjing Liang, Weiyi Wang, Chenyao Suo, Junjie Chen, Fanjiang Xu
https://arxiv.org/abs/2510.07815
Beyond Real Data: Synthetic Data through the Lens of Regularization
Amitis Shidani, Tyler Farghly, Yang Sun, Habib Ganjgahi, George Deligiannidis
https://arxiv.org/abs/2510.08095
Carnegie Supernova Project: Fast-Declining Type Ia Supernovae as Cosmological Distance Indicators
M. M. Phillips, Syed A. Uddin, Christopher R. Burns, Nicholas B. Suntzeff, C. Ashall, E. Baron, L. Galbany, P. Hoeflich, E. Y. Hsiao, Nidia Morrell, S. E. Persson, Maximilian Stritzinger, Carlos Contreras, Wendy L. Freedman, Kevin Krisciunas, S. Kumar, J. Lu, Anthony L. Piro, M. Shahbandeh
Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski
https://arxiv.org/abs/2510.07257 h…
A galactic tug-of-war: how (not) to simultaneously fit the Milky Way satellite luminosity function and the mass-metallicity relation
Sownak Bose, Alis J. Deason
https://arxiv.org/abs/2509.07066
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models
Zhangyue Yin, Qiushi Sun, Zhiyuan Zeng, Zhiyuan Yu, Qipeng Guo, Xuanjing Huang, Xipeng Qiu
https://arxiv.org/abs/2510.06014
Decisive Evidence for the First Overtone Mode in the Ringdown Signal of GW231028
Hai-Tian Wang
https://arxiv.org/abs/2509.08657 https://arxiv.org/pdf/2509.…
WISE: A Weighted Similarity Aggregation Test for Serial Independence
Qihua Zhu, Mingshuo Liu, Yuefeng Han, Doudou Zhou
https://arxiv.org/abs/2509.05678 https://
TTRV: Test-Time Reinforcement Learning for Vision Language Models
Akshit Singh, Shyam Marjit, Wei Lin, Paul Gavrikov, Serena Yeung-Levy, Hilde Kuehne, Rogerio Feris, Sivan Doveh, James Glass, M. Jehanzeb Mirza
https://arxiv.org/abs/2510.06783
Disclosing Submillimeter Galaxy Formation: Mergers or Secular Evolution?
Siu-Wang Chan, Yiping Ao, Qinghua Tan
https://arxiv.org/abs/2509.07913 https://arx…
New Lorentzian Taub-NUT and Euclidean Eguchi-Hanson Solutions in $f(R)$ gravity
J. G. Fenwick, A. M. Ghezelbash
https://arxiv.org/abs/2509.08033 https://ar…
On a surprising behavior of the likelihood ratio test in non-parametric mixture models
Yan Zhang, Stanislav Volgushev
https://arxiv.org/abs/2509.05610 https://
Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning
Wenxun Wu, Yuanyang Li, Guhan Chen, Linyue Wang, Hongyang Chen
https://arxiv.org/abs/2510.07038
Proofs of No Intrusion
Vipul Goyal, Justin Raizes
https://arxiv.org/abs/2510.06432 https://arxiv.org/pdf/2510.06432…
The Majority is not always right: RL training for solution aggregation
Wenting Zhao, Pranjal Aggarwal, Swarnadeep Saha, Asli Celikyilmaz, Jason Weston, Ilia Kulikov
https://arxiv.org/abs/2509.06870
Zero-shot 3D-Aware Trajectory-Guided image-to-video generation via Test-Time Training
Ruicheng Zhang, Jun Zhou, Zunnan Xu, Zihao Liu, Jiehui Huang, Mingyang Zhang, Yu Sun, Xiu Li
https://arxiv.org/abs/2509.06723
LATTA: Langevin-Anchored Test-Time Adaptation for Enhanced Robustness and Stability
Harshil Vejendla
https://arxiv.org/abs/2510.05530 https://arxiv.org/pdf…
Combining TSL and LLM to Automate REST API Testing: A Comparative Study
Thiago Barradas, Aline Paes, V\^ania de Oliveira Neves
https://arxiv.org/abs/2509.05540 https://
Investigating the origin of the Milky Way streams. A revised look at their orbital pole distribution in light of precession effects
Elena Asencio, Pavel Kroupa, Ingo Thies
https://arxiv.org/abs/2508.05733
Verifier-free Test-Time Sampling for Vision Language Action Models
Suhyeok Jang, Dongyoung Kim, Changyeon Kim, Youngsuk Kim, Jinwoo Shin
https://arxiv.org/abs/2510.05681 https:/…
Is it Gaussian? Testing bosonic quantum states
Filippo Girardi, Freek Witteveen, Francesco Anna Mele, Lennart Bittel, Salvatore F. E. Oliviero, David Gross, Michael Walter
https://arxiv.org/abs/2510.07305
Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework
Jie Chen, Jinhao Jiang, Yingqian Min, Zican Dong, Shijie Wang, Wayne Xin Zhao, Ji-Rong Wen
https://arxiv.org/abs/2509.05007
Specification Tests for the Error--Law in Vector Multiplicative Errors Models
\v{S}\'arka Hudecov\'a, Simos G. Meintanis
https://arxiv.org/abs/2509.06732 https://…
PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs
Manuel Frank, Haithem Afli
https://arxiv.org/abs/2510.06730 https://
MOSAIC: Minimax-Optimal Sparsity-Adaptive Inference for Change Points in Dynamic Networks
Yingying Fan, Jingyuan Liu, Jinchi Lv, Ao Sun
https://arxiv.org/abs/2509.06303 https://…
Adapt in the Wild: Test-Time Entropy Minimization with Sharpness and Feature Regularization
Shuaicheng Niu, Guohao Chen, Deyu Chen, Yifan Zhang, Jiaxiang Wu, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Chunyan Miao, Mingkui Tan
https://arxiv.org/abs/2509.04977
High-Performance Imaging in a Dilution Refrigerator
Timo Eikelmann, Mara Brinkmann, Leonie Eggers, Tuncay Ulas, Donika Imeri, Konstantin Beck, Lasse Jens Irrgang, Sunil Kumar Mahato, Rikhav Shah, Ralf Riedinger
https://arxiv.org/abs/2510.07054
GenAI-based test case generation and execution in SDV platform
Denesa Zyberaj, Lukasz Mazur, Nenad Petrovic, Pankhuri Verma, Pascal Hirmer, Dirk Slama, Xiangwei Cheng, Alois Knoll
https://arxiv.org/abs/2509.05112
AutoDAN-Reasoning: Enhancing Strategies Exploration based Jailbreak Attacks with Test-Time Scaling
Xiaogeng Liu, Chaowei Xiao
https://arxiv.org/abs/2510.05379 https://
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Weihao Zeng, Keqing He, Chuqiao Kuang, Xiaoguang Li, Junxian He
https://arxiv.org/abs/2510.06135 htt…
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
Yong Du, Yuchen Yan, Fei Tang, Zhengxi Lu, Chang Zong, Weiming Lu, Shengpei Jiang, Yongliang Shen
https://arxiv.org/abs/2508.05615
NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering
Alexander Murphy, Michal Danilowski, Soumyajit Chatterjee, Abhirup Ghosh
https://arxiv.org/abs/2510.05635 h…
A Parametrized Test of General Relativity for LISA Massive Black Hole Binary Inspirals
Manuel Piarulli, Sylvain Marsat, Elise M. S\"anger, Alessandra Buonanno, Jan Steinhoff, Nicola Tamanini
https://arxiv.org/abs/2510.06330
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Jiaru Zou, Soumya Roy, Vinay Kumar Verma, Ziyi Wang, David Wipf, Pan Lu, Sumit Negi, James Zou, Jingrui He
https://arxiv.org/abs/2510.06217
ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions
Matteo Bortoletto, Constantin Ruhdorfer, Andreas Bulling
https://arxiv.org/abs/2509.05066 https://
Embezzlement as a "Self-Test" for Infinite Copies of Entangled States
Li Liu
https://arxiv.org/abs/2509.05036 https://arxiv.org/pdf/2509.05036
When vacuum breaks: a self-consistency test for astrophysical environments in extreme mass ratio inspirals
Lorenzo Copparoni, Rohit S. Chandramouli, Enrico Barausse
https://arxiv.org/abs/2510.06948
Test Case Generation from Bug Reports via Large Language Models: A Cognitive Layered Evaluation Framework
Irtaza Sajid Qureshi (Jack), Zhen Ming (Jack), Jiang
https://arxiv.org/abs/2510.05365
Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime
Andreas Maurer, Erfan Mirzaei, Massimiliano Pontil
https://arxiv.org/abs/2510.06028 https…
MatheMagic: Generating Dynamic Mathematics Benchmarks Robust to Memorization
Dayy\'an O'Brien, Barry Haddow, Emily Allaway, Pinzhen Chen
https://arxiv.org/abs/2510.05962