2025-09-17 09:23:00
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
Mukai Li, Linfeng Song, Zhenwen Liang, Jiahao Xu, Shansan Gong, Qi Liu, Haitao Mi, Dong Yu
https://arxiv.org/abs/2509.12603
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
Mukai Li, Linfeng Song, Zhenwen Liang, Jiahao Xu, Shansan Gong, Qi Liu, Haitao Mi, Dong Yu
https://arxiv.org/abs/2509.12603
Dynamics of test particles, QPOs and thermodynamics of charged Euler-Heisenberg AdS black holes with a cloud of strings and dark matter
Faizuddin Ahmed, Ahmad Al-Badawi, \.Izzet Sakall{\i}
https://arxiv.org/abs/2509.12264
LTA-thinker: Latent Thought-Augmented Training Framework for Large Language Models on Complex Reasoning
Jiaqi Wang, Binquan Ji, Haibo Luo, Yiyang Qi, Ruiting Li, Huiyan Wang, Yuantao Han, Cangyi Yang, jiaxu Zhang, Feiliang Ren
https://arxiv.org/abs/2509.12875
ROOM: A Physics-Based Continuum Robot Simulator for Photorealistic Medical Datasets Generation
Salvatore Esposito, Mat\'ias Mattamala, Daniel Rebain, Francis Xiatian Zhang, Kevin Dhaliwal, Mohsen Khadem, Subramanian Ramamoorthy
https://arxiv.org/abs/2509.13177
The Adaptation Paradox: Agency vs. Mimicry in Companion Chatbots
T. James Brandt, Cecilia Xi Wang
https://arxiv.org/abs/2509.12525 https://arxiv.org/pdf/25…
A novel pointing technique for the enhancement of Tropospheric Delay Calibration System performances
David Bernacchia, Riccardo Lasagni Manghi, Marco Zannoni, Paolo Tortora, Jose Villalvilla, Javier De Vicente, Paolo Cappuccio, Luciano Iess
https://arxiv.org/abs/2509.13199
On the Hardness of Order Finding and Equivalence Testing for ROABPs
C. Ramya, Pratik Shastri
https://arxiv.org/abs/2509.13238 https://arxiv.org/pdf/2509.13…
Zeeman Doppler Imaging of $\tau$Ceti: The Weakest Magnetic Field Detected in a Sun-like Star
Federica Chiti, Oleg Kochukhov, Jennifer L. van Saders, Travis S. Metcalfe
https://arxiv.org/abs/2509.12310 …
A Statistical Test for Comparing the Linkage and Admixture Model Based on Central Limit Theorems
Carola Sophia Heinzel
https://arxiv.org/abs/2509.12734 https://
An efficient splitting iteration for a CDA-accelerated solver for incompressible flow problems
Victoria L. Fisher, Leo G. Rebholz, Duygu Vargun
https://arxiv.org/abs/2509.12547 …
A Martingale Kernel Two-Sample Test
Anirban Chatterjee, Aaditya Ramdas
https://arxiv.org/abs/2510.11853 https://arxiv.org/pdf/2510.11853
Evolution of low surface brightness ultra-thin galaxies: The role of dark matter halo and bar formation on disk thickness
K. Aditya, Arunima Banerjee
https://arxiv.org/abs/2509.12966
BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Tomas Ruiz, Siyao Peng, Barbara Plank, Carsten Schwemmer
https://arxiv.org/abs/2510.12516
Learning-To-Measure: In-context Active Feature Acquisition
Yuta Kobayashi, Zilin Jing, Jiayu Yao, Hongseok Namkoong, Shalmali Joshi
https://arxiv.org/abs/2510.12624 https://
Efficient Real-World Deblurring using Single Images: AIM 2025 Challenge Report
Daniel Feijoo, Paula Garrido-Mellado, Marcos V. Conde, Jaesung Rim, Alvaro Garcia, Sunghyun Cho, Radu Timofte
https://arxiv.org/abs/2510.12788
Data-driven Methods of Extracting Text Structure and Information Transfer
Shinichi Honna, Taichi Murayama, Akira Matsui
https://arxiv.org/abs/2509.12999 https://
Perfect fluid dark matter: a viability test with galaxy rotation curves
Jan Kuncewicz
https://arxiv.org/abs/2509.12268 https://arxiv.org/pdf/2509.12268
DIPLODOCUS II: Implementation of transport equations and test cases relevant to micro-scale physics of jetted astrophysical sources
Christopher N. Everett, Marc Klinger-Plaisier, Garret Cotter
https://arxiv.org/abs/2510.12505
Agentic Property-Based Testing: Finding Bugs Across the Python Ecosystem
Muhammad Maaz, Liam DeVoe, Zac Hatfield-Dodds, Nicholas Carlini
https://arxiv.org/abs/2510.09907 https:/…
ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
https://arxiv.org/abs/2510.10281 htt…
Selection Procedures in Competitive Admission
Nathan Hancart
https://arxiv.org/abs/2510.12653 https://arxiv.org/pdf/2510.12653
On Korovkin-type theorems including exponential test functions on infinite intervals through power series convergence
Dilek S\"oylemez, Mehmet \"Unver
https://arxiv.org/abs/2510.12568
Data-Model Co-Evolution: Growing Test Sets to Refine LLM Behavior
Minjae Lee, Minsuk Kahng
https://arxiv.org/abs/2510.12728 https://arxiv.org/pdf/2510.1272…
The resilience of the sailboat stable region
Rafael Sfair, Tiago F. L. L. Pinheiro, Giovana Ramon, Ernesto Vieira
https://arxiv.org/abs/2510.11855 https://…
Hierarchical summaries for primordial non-Gaussianities
M. S. Cagliari, A. Bairagi, B. Wandelt
https://arxiv.org/abs/2510.12715 https://arxiv.org/pdf/2510.…
A Variational Physics-Informed Neural Network Framework Using Petrov-Galerkin Method for Solving Singularly Perturbed Boundary Value Problems
Vijay Kumar, Gautam Singh
https://arxiv.org/abs/2509.12271 …
Beyond Test Scores: How Academic Rank Shapes Long-Term Outcomes
Emilia Del Bono, Angus Holford, Tommaso Sartori
https://arxiv.org/abs/2510.11973 https://ar…
Unitary representations attached to parabolic subgroups: the case of abelian unipotent radical
Dan Ciubotaru
https://arxiv.org/abs/2510.11862 https://arxiv…
If you're using #lazyblorg as your static website generator: I've updated the project today.
It now used "uv" for dependency management, script invocation and unit test execution. Furthermore, I adapted the code to match the #pandoc version of Debian 13 Trixie.
Although you ne…
Constraints on the early growth of massive black holes from PTA and JWST with L-GalaxiesBH
Silvia Bonoli, David Izquierdo-Villalba, Daniele Spinoso, Monica Colpi, Alberto Sesana, Markos Polkas, Volker Springel
https://arxiv.org/abs/2509.12325
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[13/14]:
- Class-Invariant Test-Time Augmentation for Domain Generalization
Zhicheng Lin, Xiaolin Wu, Xi Zhang
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/3]:
- ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
Quantum criticality at the end of a pseudogap phase in superconducting infinite-layer nickelates
C. Iorio-Duval, E. Beauchesne-Blanchet, F. Perreault, J. L. Santana Gonz\'alez, W. Sun, Y. F. Nie, A. Gourgout, G. Grissonnanche
https://arxiv.org/abs/2510.12786
Resource-sensitive but language-blind: Community size and not grammatical complexity better predicts the accuracy of Large Language Models in a novel Wug Test
Nikoleta Pantelidou, Evelina Leivada, Paolo Morosi
https://arxiv.org/abs/2510.12463
The mass of $^{101}$Sn and Bayesian extrapolations to the proton drip line
Christian M. Ireland, Georg Bollen, Scott E. Campbell, Xiangcheng Chen, Hannah Erington, Nadeesha D. Gamage, Kyle Godbey, Alicen M. Houff, Christopher Izzo, Bailey Knight, Sudhanva Lalit, Erich Leistenschneider, E. Marilena Lykiardopoulou, Franziska M. Maier, Witold Nazarewicz, Rodney Orford, William S. Porter, Caleb Quick, Ante Ravlic, Matthew Redshaw, Paul-Gerhard Reinhard, Ryan Ringle, Stefan Schwarz, Chandan…
Random Forest Classification of MBTA Gravitational-Wave Triggers for Low-Latency Detection
Lorenzo Mobilia, Gianluca Maria Guidi
https://arxiv.org/abs/2509.12882 https://…
Non-traditional data in pandemic preparedness and response: identifying and addressing first and last-mile challenges
Mattia Mazzoli, Irma Varela-Lasheras, Sonia Namorado, Constantino Pereira Caetano, Andreia Leite, Lisa Hermans, Niel Hens, Polen T\"urkmen, Kyriaki Kalimeri, Leo Ferres, Ciro Cattuto, Daniela Paolotti, Stefaan Verhulst
https://
A Kolmogorov-Smirnov-Type Test for Dependently Double-Truncated Data
Anne-Marie Toparkus, Rafael Weissbach
https://arxiv.org/abs/2510.11517 https://arxiv.o…
Search-based Hyperparameter Tuning for Python Unit Test Generation
Stephan Lukasczyk, Gordon Fraser
https://arxiv.org/abs/2510.08716 https://arxiv.org/pdf/…
General mean-field BSDEs with integrable terminal values
Weimin Jiang, Juan Li, Yan Shen
https://arxiv.org/abs/2510.11067 https://arxiv.org/pdf/2510.11067
Crosslisted article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[8/17]:
- MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
Chen, Lei, Zhang, Ke, Zhu, Chen, Lu, Huang, Feng, He, Sun, Wu, Wang
The double neutron star PSR J1946 2052 I. Masses and tests of general relativity
Lingqi Meng, Paulo C. C. Freire, Kevin Stovall, Norbert Wex, Xueli Miao, Weiwei Zhu, Michael Kramer, James M. Cordes, Huanchen Hu, Jinchen Jiang, Emilie Parent, Lijing Shao, Ingrid H. Stairs, Mengyao Xue, Adam Brazier, Fernando Camilo, David J. Champion, Shami Chatterjee, Fronefield Crawford, Ziyao Fang, Qiuyang Fu, Yanjun Guo, Jason W. T. Hessels, Maura MacLaughlin, Chenchen Miao, Jiarui Niu, Ziwei Wu, Ju…
PAC Learnability in the Presence of Performativity
Ivan Kirev, Lyuben Baltadzhiev, Nikola Konstantinov
https://arxiv.org/abs/2510.08335 https://arxiv.org/p…
CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense
Yang Zhuochen, Fok Kar Wai, Thing Vrizlynn
https://arxiv.org/abs/2510.11137 https://arx…
D-TPT: Dimensional Entropy Maximization for Calibrating Test-Time Prompt Tuning in Vision-Language Models
Jisu Han, Wonjun Hwang
https://arxiv.org/abs/2510.09473 https://…
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Jens Tuyls, Dylan J. Foster, Akshay Krishnamurthy, Jordan T. Ash
https://arxiv.org/abs/2510.11686
Gauging the Competition: Understanding Social Comparison and Anxiety through Eye-tracking in Virtual Reality Group Interview
Shi-Ting Ni, Kairong Fang, Yuyang Wang, Pan Hui
https://arxiv.org/abs/2510.12590
Fast radio bursts shed light on direct gravity test on cosmological scales
Shuren Zhou, Pengjie Zhang
https://arxiv.org/abs/2510.11022 https://arxiv.org/pd…
Automated Discovery of Test Oracles for Database Management Systems Using LLMs
Qiuyang Mang, Runyuan He, Suyang Zhong, Xiaoxuan Liu, Huanchen Zhang, Alvin Cheung
https://arxiv.org/abs/2510.06663
PSA for users that regularly test #Fedora Beta as well as proposed updates once the new version was released:
Do not enable updates-testing[1] by modifying /etc/yum.repos.d/fedora-updates-testing.repo; instead do it like this:
$ sudo dnf config-manager setopt updates-testing.enabled=true
Otherwise updates-testing will be disabled shortly before the release of a new version (t…
Constraint-Guided Unit Test Generation for Machine Learning Libraries
Lukas Krodinger, Altin Hajdari, Stephan Lukasczyk, Gordon Fraser
https://arxiv.org/abs/2510.09108 https://
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[2/9]:
- Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei
Bridging Research and Practice in Simulation-based Testing of Industrial Robot Navigation Systems
Sajad Khatiri, Francisco Eli Vina Barrientos, Maximilian Wulf, Paolo Tonella, Sebastiano Panichella
https://arxiv.org/abs/2510.09396
Probing the geological setting of exoplanets through atmospheric analysis: using Mars as a test case
Monica Rainer, Evandro Balbi, Francesco Borsa, Paola Cianfarra, Avet Harutyunyan, Silvano Tosi
https://arxiv.org/abs/2510.09305
Titans Revisited: A Lightweight Reimplementation and Critical Analysis of a Test-Time Memory Model
Gavriel Di Nepi, Federico Siciliano, Fabrizio Silvestri
https://arxiv.org/abs/2510.09551
The Importance of Being Adaptable: An Exploration of the Power and Limitations of Domain Adaptation for Simulation-Based Inference with Galaxy Clusters
Michelle Ntampaka, A. Ciprijanovic, Ana Maria Delgado, John Soltis, John F. Wu, Mikaeel Yunus, John ZuHone
https://arxiv.org/abs/2510.09748
Simultaneous Frequentist Calibration of Confidence Regions for Multiple Functionals in Constrained Inverse Problems
Pau Batlle, Pratik Patil, Michael Stanley, Javier Ruiz Lupon, Houman Owhadi, Mikael Kuusela
https://arxiv.org/abs/2510.11708
How precisely can we measure the ages of subgiant and giant stars?
Cheyanne Shariat, Kareem El-Badry, Soumyadeep Bhattacharjee
https://arxiv.org/abs/2510.08675 https://
Benchmarking foundation models for hyperspectral image classification: Application to cereal crop type mapping
Walid Elbarz, Mohamed Bourriz, Hicham Hajji, Hamd Ait Abdelali, Fran\c{c}ois Bourzeix
https://arxiv.org/abs/2510.11576
Probing cosmic curvature with Alcock-Paczynski data
Yungui Gong, Qing Gao, Xuchen Lu, Zhu Yi
https://arxiv.org/abs/2510.11555 https://arxiv.org/pdf/2510.11…
Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation
Sondos Mahmoud Bsharat, Zhiqiang Shen
https://arxiv.org/abs/2510.09599 https://arxi…
How Students Use Generative AI for Software Testing: An Observational Study
Baris Ardic, Quentin Le Dilavrec, Andy Zaidman
https://arxiv.org/abs/2510.10551 https://
Accretion onto Reissner-Nordstr\"{o}m naked singularities
Tomasz Krajewski, W{\l}odek Klu\'zniak
https://arxiv.org/abs/2510.10043 https://arxiv.or…
LLMs are All You Need? Improving Fuzz Testing for MOJO with Large Language Models
Linghan Huang, Peizhou Zhao, Huaming Chen
https://arxiv.org/abs/2510.10179 https://
LATTA: Langevin-Anchored Test-Time Adaptation for Enhanced Robustness and Stability
Harshil Vejendla
https://arxiv.org/abs/2510.05530 https://arxiv.org/pdf…
Verifier-free Test-Time Sampling for Vision Language Action Models
Suhyeok Jang, Dongyoung Kim, Changyeon Kim, Youngsuk Kim, Jinwoo Shin
https://arxiv.org/abs/2510.05681 https:/…
The Gravitational Wave Memory from Binary Neutron Star Mergers
Jamie Bamber, Antonios Tsokaros, Milton Ruiz, Stuart L. Shapiro, Marc Favata, Matthew Karlson, Fabrizio Venturi Pi\~nas
https://arxiv.org/abs/2510.09742
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents
Lingfei Qian, Xueqing Peng, Yan Wang, Vincent Jim Zhang, Huan He, Hanley Smith, Yi Han, Yueru He, Haohang Li, Yupeng Cao, Yangyang Yu, Alejandro Lopez-Lira, Peng Lu, Jian-Yun Nie, Guojun Xiong, Jimin Huang, Sophia Ananiadou
https://arxiv.org/abs/2510.11695
LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?
Kaijian Zou, Aaron Xiong, Yunxiang Zhang, Frederick Zhang, Yueqi Ren, Jirong Yang, Ayoung Lee, Shitanshu Bhushan, Lu Wang
https://arxiv.org/abs/2510.09595
AutoDAN-Reasoning: Enhancing Strategies Exploration based Jailbreak Attacks with Test-Time Scaling
Xiaogeng Liu, Chaowei Xiao
https://arxiv.org/abs/2510.05379 https://
Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration
Mohanakrishnan Hariharan, Satish Arvapalli, Seshu Barma, Evangeline Sheela
https://arxiv.org/abs/2510.10824
Detection of mean changes in partially observed functional data
\v{S}\'arka Hudecov\'a, Claudia Kirch
https://arxiv.org/abs/2510.07854 https://arxi…
Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski
https://arxiv.org/abs/2510.07257 h…
LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition
Yushuo Zheng, Zicheng Zhang, Xiongkuo Min, Huiyu Duan, Guangtao Zhai
https://arxiv.org/abs/2510.08928 h…
Extending CSST Emulator to post-DESI era
Zhao Chen, Yu Yu
https://arxiv.org/abs/2510.09503 https://arxiv.org/pdf/2510.09503
GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
Wen Ye, Zhaocheng Liu, Yuwei Gui, Tingyu Yuan, Yunyue Su, Bowen Fang, Chaoyang Zhao, Qiang Liu, Liang Wang
https://arxiv.org/abs/2510.07217
Chaos of charged particles near a renormalized group improved Kerr black hole in an external magnetic field
Junjie Lu, Xin Wu
https://arxiv.org/abs/2510.08954 https://
Proofs of No Intrusion
Vipul Goyal, Justin Raizes
https://arxiv.org/abs/2510.06432 https://arxiv.org/pdf/2510.06432…
Extension of Wald-Wolfowitz Runs Test for Regression Validity Testing with Repeated Measures of Independent Variable
Bo-Yao Lian, Nelson G. Chen
https://arxiv.org/abs/2510.05861
Euclid preparation. Cosmology Likelihood for Observables in Euclid (CLOE). 4: Validation and Performance
Collaboration, Martinelli, Pezzotta, Sciotti, Blot, Bonici, Camera, Ca\~nas-Herrera, Cardone, Carrilho, Casas, Davini, Di Domizio, Farrens, Goh, Beauchamps, Ili\'c, Joudaki, Keil, Le Brun, Moretti, Pettorino, S\'anchez, Sakr, Tanidis, Tutusaus, Ajani, Crocce, Giocoli, Legrand, Lembo, Lesci, Girones, Nouri-Zonoz, Pamuk, Tsedrik, Bel, Carbone, Duncan, Kilbinger, Lacasa, Lattan…
NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering
Alexander Murphy, Michal Danilowski, Soumyajit Chatterjee, Abhirup Ghosh
https://arxiv.org/abs/2510.05635 h…
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models
Zhangyue Yin, Qiushi Sun, Zhiyuan Zeng, Zhiyuan Yu, Qipeng Guo, Xuanjing Huang, Xipeng Qiu
https://arxiv.org/abs/2510.06014
TTRV: Test-Time Reinforcement Learning for Vision Language Models
Akshit Singh, Shyam Marjit, Wei Lin, Paul Gavrikov, Serena Yeung-Levy, Hilde Kuehne, Rogerio Feris, Sivan Doveh, James Glass, M. Jehanzeb Mirza
https://arxiv.org/abs/2510.06783
A new composite Mann-Whitney test for two-sample survival comparisons with right-censored data
Abid Hussain, Touqeer Ahmad
https://arxiv.org/abs/2510.05353 https://
Particles with precessing spin in Kerr spacetime: analytic solutions for eccentric orbits and homoclinic motion near the equatorial plane
Gabriel Andres Piovano
https://arxiv.org/abs/2510.09597
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Weihao Zeng, Keqing He, Chuqiao Kuang, Xiaoguang Li, Junxian He
https://arxiv.org/abs/2510.06135 htt…
Test Case Generation from Bug Reports via Large Language Models: A Cognitive Layered Evaluation Framework
Irtaza Sajid Qureshi (Jack), Zhen Ming (Jack), Jiang
https://arxiv.org/abs/2510.05365
Effects of magnetic fields on spinning test particles orbiting Kerr-Bertotti-Robinson black holes
Yu-Kun Zhang, Shao-Wen Wei
https://arxiv.org/abs/2510.07914 https://
The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language Models
Konrad L\"ohr, Shuzhou Yuan, Michael F\"arber
https://arxiv.org/abs/2510.08236
Test-Time Defense Against Adversarial Attacks via Stochastic Resonance of Latent Ensembles
Dong Lao, Yuxiang Zhang, Haniyeh Ehsani Oskouie, Yangchao Wu, Alex Wong, Stefano Soatto
https://arxiv.org/abs/2510.03224
ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
Qin Liu, Jacob Dineen, Yuxi Huang, Sheng Zhang, Hoifung Poon, Ben Zhou, Muhao Chen
https://arxiv.org/abs/2510.08569
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Jiaru Zou, Soumya Roy, Vinay Kumar Verma, Ziyi Wang, David Wipf, Pan Lu, Sumit Negi, James Zou, Jingrui He
https://arxiv.org/abs/2510.06217
Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime
Andreas Maurer, Erfan Mirzaei, Massimiliano Pontil
https://arxiv.org/abs/2510.06028 https…
Self-Reflective Generation at Test Time
Jian Mu, Qixin Zhang, Zhiyong Wang, Menglin Yang, Shuang Qiu, Chengwei Qin, Zhongxiang Dai, Yao Shu
https://arxiv.org/abs/2510.02919 http…
Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts
Jihoon Lee, Hoyeon Moon, Kevin Zhai, Arun Kumar Chithanar, Anit Kumar Sahu, Soummya Kar, Chul Lee, Souradip Chakraborty, Amrit Singh Bedi
https://arxiv.org/abs/2510.05040
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Yi Lu, Jianing Wang, Linsen Guo, Wei He, Hongyin Tang, Tao Gui, Xuanjing Huang, Xuezhi Cao, Wei Wang, Xunliang Cai
https://arxiv.org/abs/2510.08189
Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models
Runchu Tian, Junxia Cui, Xueqiang Xu, Feng Yao, Jingbo Shang
https://arxiv.org/abs/2510.05090
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
Nevan Wichers, Aram Ebtekar, Ariana Azarbal, Victor Gillioz, Christine Ye, Emil Ryd, Neil Rathi, Henry Sleight, Alex Mallen, Fabien Roger, Samuel Marks
https://arxiv.org/abs/2510.05024
On the Role of Temperature Sampling in Test-Time Scaling
Yuheng Wu, Azalia Mirhoseini, Thierry Tambe
https://arxiv.org/abs/2510.02611 https://arxiv.org/pdf…
Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning
Wenxun Wu, Yuanyang Li, Guhan Chen, Linyue Wang, Hongyang Chen
https://arxiv.org/abs/2510.07038