2025-09-18 08:16:41
Thermodynamic Split Conjecture and an Observational Test for Cosmological Entropy
Oem Trivedi
https://arxiv.org/abs/2509.13689 https://arxiv.org/pdf/2509.1…
Thermodynamic Split Conjecture and an Observational Test for Cosmological Entropy
Oem Trivedi
https://arxiv.org/abs/2509.13689 https://arxiv.org/pdf/2509.1…
BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Tomas Ruiz, Siyao Peng, Barbara Plank, Carsten Schwemmer
https://arxiv.org/abs/2510.12516
An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software
Sina Gogani-Khiabani (University of Illinois Chicago), Ashutosh Trivedi (University of Colorado Boulder), Diptikalyan Saha (IBM Research), Saeid Tizpaz-Niari (University of Illinois Chicago)
https://arxiv.org/abs/2509.13471
Data-Model Co-Evolution: Growing Test Sets to Refine LLM Behavior
Minjae Lee, Minsuk Kahng
https://arxiv.org/abs/2510.12728 https://arxiv.org/pdf/2510.1272…
Generalized Covariance Estimator under Misspecification and Constraints
Aryan Manafi Neyazi
https://arxiv.org/abs/2509.13492 https://arxiv.org/pdf/2509.134…
A Martingale Kernel Two-Sample Test
Anirban Chatterjee, Aaditya Ramdas
https://arxiv.org/abs/2510.11853 https://arxiv.org/pdf/2510.11853
Learning-To-Measure: In-context Active Feature Acquisition
Yuta Kobayashi, Zilin Jing, Jiayu Yao, Hongseok Namkoong, Shalmali Joshi
https://arxiv.org/abs/2510.12624 https://
Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus
Lamia Lamrani, Beno\^it Collins, Jean-Philippe Bouchaud
https://arxiv.org/abs/2509.13923
Efficient Real-World Deblurring using Single Images: AIM 2025 Challenge Report
Daniel Feijoo, Paula Garrido-Mellado, Marcos V. Conde, Jaesung Rim, Alvaro Garcia, Sunghyun Cho, Radu Timofte
https://arxiv.org/abs/2510.12788
Correcting exponentiality test for binned earthquake magnitudes
Angela Stallone, Ilaria Spassiani
https://arxiv.org/abs/2512.13599 https://arxiv.org/pdf/25…
DIPLODOCUS II: Implementation of transport equations and test cases relevant to micro-scale physics of jetted astrophysical sources
Christopher N. Everett, Marc Klinger-Plaisier, Garret Cotter
https://arxiv.org/abs/2510.12505
ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
https://arxiv.org/abs/2510.10281 htt…
Titans Revisited: A Lightweight Reimplementation and Critical Analysis of a Test-Time Memory Model
Gavriel Di Nepi, Federico Siciliano, Fabrizio Silvestri
https://arxiv.org/abs/2510.09551
Resource-sensitive but language-blind: Community size and not grammatical complexity better predicts the accuracy of Large Language Models in a novel Wug Test
Nikoleta Pantelidou, Evelina Leivada, Paolo Morosi
https://arxiv.org/abs/2510.12463
On Korovkin-type theorems including exponential test functions on infinite intervals through power series convergence
Dilek S\"oylemez, Mehmet \"Unver
https://arxiv.org/abs/2510.12568
The resilience of the sailboat stable region
Rafael Sfair, Tiago F. L. L. Pinheiro, Giovana Ramon, Ernesto Vieira
https://arxiv.org/abs/2510.11855 https://…
Hierarchical summaries for primordial non-Gaussianities
M. S. Cagliari, A. Bairagi, B. Wandelt
https://arxiv.org/abs/2510.12715 https://arxiv.org/pdf/2510.…
Unitary representations attached to parabolic subgroups: the case of abelian unipotent radical
Dan Ciubotaru
https://arxiv.org/abs/2510.11862 https://arxiv…
If you're using #lazyblorg as your static website generator: I've updated the project today.
It now used "uv" for dependency management, script invocation and unit test execution. Furthermore, I adapted the code to match the #pandoc version of Debian 13 Trixie.
Although you ne…
Quantum criticality at the end of a pseudogap phase in superconducting infinite-layer nickelates
C. Iorio-Duval, E. Beauchesne-Blanchet, F. Perreault, J. L. Santana Gonz\'alez, W. Sun, Y. F. Nie, A. Gourgout, G. Grissonnanche
https://arxiv.org/abs/2510.12786
Gauging the Competition: Understanding Social Comparison and Anxiety through Eye-tracking in Virtual Reality Group Interview
Shi-Ting Ni, Kairong Fang, Yuyang Wang, Pan Hui
https://arxiv.org/abs/2510.12590
Non-traditional data in pandemic preparedness and response: identifying and addressing first and last-mile challenges
Mattia Mazzoli, Irma Varela-Lasheras, Sonia Namorado, Constantino Pereira Caetano, Andreia Leite, Lisa Hermans, Niel Hens, Polen T\"urkmen, Kyriaki Kalimeri, Leo Ferres, Ciro Cattuto, Daniela Paolotti, Stefaan Verhulst
https://
Bridging Research and Practice in Simulation-based Testing of Industrial Robot Navigation Systems
Sajad Khatiri, Francisco Eli Vina Barrientos, Maximilian Wulf, Paolo Tonella, Sebastiano Panichella
https://arxiv.org/abs/2510.09396
Selection Procedures in Competitive Admission
Nathan Hancart
https://arxiv.org/abs/2510.12653 https://arxiv.org/pdf/2510.12653
Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Jens Tuyls, Dylan J. Foster, Akshay Krishnamurthy, Jordan T. Ash
https://arxiv.org/abs/2510.11686
D-TPT: Dimensional Entropy Maximization for Calibrating Test-Time Prompt Tuning in Vision-Language Models
Jisu Han, Wonjun Hwang
https://arxiv.org/abs/2510.09473 https://…
The Importance of Being Adaptable: An Exploration of the Power and Limitations of Domain Adaptation for Simulation-Based Inference with Galaxy Clusters
Michelle Ntampaka, A. Ciprijanovic, Ana Maria Delgado, John Soltis, John F. Wu, Mikaeel Yunus, John ZuHone
https://arxiv.org/abs/2510.09748
General mean-field BSDEs with integrable terminal values
Weimin Jiang, Juan Li, Yan Shen
https://arxiv.org/abs/2510.11067 https://arxiv.org/pdf/2510.11067
Beyond Test Scores: How Academic Rank Shapes Long-Term Outcomes
Emilia Del Bono, Angus Holford, Tommaso Sartori
https://arxiv.org/abs/2510.11973 https://ar…
A Kolmogorov-Smirnov-Type Test for Dependently Double-Truncated Data
Anne-Marie Toparkus, Rafael Weissbach
https://arxiv.org/abs/2510.11517 https://arxiv.o…
Crosslisted article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[8/17]:
- MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
Chen, Lei, Zhang, Ke, Zhu, Chen, Lu, Huang, Feng, He, Sun, Wu, Wang
PSA for users that regularly test #Fedora Beta as well as proposed updates once the new version was released:
Do not enable updates-testing[1] by modifying /etc/yum.repos.d/fedora-updates-testing.repo; instead do it like this:
$ sudo dnf config-manager setopt updates-testing.enabled=true
Otherwise updates-testing will be disabled shortly before the release of a new version (t…
Chaos of charged particles near a renormalized group improved Kerr black hole in an external magnetic field
Junjie Lu, Xin Wu
https://arxiv.org/abs/2510.08954 https://
Search-based Hyperparameter Tuning for Python Unit Test Generation
Stephan Lukasczyk, Gordon Fraser
https://arxiv.org/abs/2510.08716 https://arxiv.org/pdf/…
The double neutron star PSR J1946 2052 I. Masses and tests of general relativity
Lingqi Meng, Paulo C. C. Freire, Kevin Stovall, Norbert Wex, Xueli Miao, Weiwei Zhu, Michael Kramer, James M. Cordes, Huanchen Hu, Jinchen Jiang, Emilie Parent, Lijing Shao, Ingrid H. Stairs, Mengyao Xue, Adam Brazier, Fernando Camilo, David J. Champion, Shami Chatterjee, Fronefield Crawford, Ziyao Fang, Qiuyang Fu, Yanjun Guo, Jason W. T. Hessels, Maura MacLaughlin, Chenchen Miao, Jiarui Niu, Ziwei Wu, Ju…
The mass of $^{101}$Sn and Bayesian extrapolations to the proton drip line
Christian M. Ireland, Georg Bollen, Scott E. Campbell, Xiangcheng Chen, Hannah Erington, Nadeesha D. Gamage, Kyle Godbey, Alicen M. Houff, Christopher Izzo, Bailey Knight, Sudhanva Lalit, Erich Leistenschneider, E. Marilena Lykiardopoulou, Franziska M. Maier, Witold Nazarewicz, Rodney Orford, William S. Porter, Caleb Quick, Ante Ravlic, Matthew Redshaw, Paul-Gerhard Reinhard, Ryan Ringle, Stefan Schwarz, Chandan…
CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense
Yang Zhuochen, Fok Kar Wai, Thing Vrizlynn
https://arxiv.org/abs/2510.11137 https://arx…
PAC Learnability in the Presence of Performativity
Ivan Kirev, Lyuben Baltadzhiev, Nikola Konstantinov
https://arxiv.org/abs/2510.08335 https://arxiv.org/p…
How precisely can we measure the ages of subgiant and giant stars?
Cheyanne Shariat, Kareem El-Badry, Soumyadeep Bhattacharjee
https://arxiv.org/abs/2510.08675 https://
Simultaneous Frequentist Calibration of Confidence Regions for Multiple Functionals in Constrained Inverse Problems
Pau Batlle, Pratik Patil, Michael Stanley, Javier Ruiz Lupon, Houman Owhadi, Mikael Kuusela
https://arxiv.org/abs/2510.11708
Guess your neighbor's input: Quantum advantage in Feige's game
Simon Schmidt, Sigurd A. L. Storgaard, Michael Walter, Yuming Zhao
https://arxiv.org/abs/2510.08484 https:…
Fast radio bursts shed light on direct gravity test on cosmological scales
Shuren Zhou, Pengjie Zhang
https://arxiv.org/abs/2510.11022 https://arxiv.org/pd…
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[13/14]:
- Class-Invariant Test-Time Augmentation for Domain Generalization
Zhicheng Lin, Xiaolin Wu, Xi Zhang
Crosslisted article(s) found for cs.CV. https://arxiv.org/list/cs.CV/new
[2/3]:
- ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation
Sondos Mahmoud Bsharat, Zhiqiang Shen
https://arxiv.org/abs/2510.09599 https://arxi…
Constraint-Guided Unit Test Generation for Machine Learning Libraries
Lukas Krodinger, Altin Hajdari, Stephan Lukasczyk, Gordon Fraser
https://arxiv.org/abs/2510.09108 https://
Probing the geological setting of exoplanets through atmospheric analysis: using Mars as a test case
Monica Rainer, Evandro Balbi, Francesco Borsa, Paola Cianfarra, Avet Harutyunyan, Silvano Tosi
https://arxiv.org/abs/2510.09305
LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?
Kaijian Zou, Aaron Xiong, Yunxiang Zhang, Frederick Zhang, Yueqi Ren, Jirong Yang, Ayoung Lee, Shitanshu Bhushan, Lu Wang
https://arxiv.org/abs/2510.09595
Benchmarking foundation models for hyperspectral image classification: Application to cereal crop type mapping
Walid Elbarz, Mohamed Bourriz, Hicham Hajji, Hamd Ait Abdelali, Fran\c{c}ois Bourzeix
https://arxiv.org/abs/2510.11576
The Gravitational Wave Memory from Binary Neutron Star Mergers
Jamie Bamber, Antonios Tsokaros, Milton Ruiz, Stuart L. Shapiro, Marc Favata, Matthew Karlson, Fabrizio Venturi Pi\~nas
https://arxiv.org/abs/2510.09742
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents
Lingfei Qian, Xueqing Peng, Yan Wang, Vincent Jim Zhang, Huan He, Hanley Smith, Yi Han, Yueru He, Haohang Li, Yupeng Cao, Yangyang Yu, Alejandro Lopez-Lira, Peng Lu, Jian-Yun Nie, Guojun Xiong, Jimin Huang, Sophia Ananiadou
https://arxiv.org/abs/2510.11695
Probing cosmic curvature with Alcock-Paczynski data
Yungui Gong, Qing Gao, Xuchen Lu, Zhu Yi
https://arxiv.org/abs/2510.11555 https://arxiv.org/pdf/2510.11…
Verifier-free Test-Time Sampling for Vision Language Action Models
Suhyeok Jang, Dongyoung Kim, Changyeon Kim, Youngsuk Kim, Jinwoo Shin
https://arxiv.org/abs/2510.05681 https:/…
How Students Use Generative AI for Software Testing: An Observational Study
Baris Ardic, Quentin Le Dilavrec, Andy Zaidman
https://arxiv.org/abs/2510.10551 https://
Accretion onto Reissner-Nordstr\"{o}m naked singularities
Tomasz Krajewski, W{\l}odek Klu\'zniak
https://arxiv.org/abs/2510.10043 https://arxiv.or…
Replaced article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[2/9]:
- Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei
LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition
Yushuo Zheng, Zicheng Zhang, Xiongkuo Min, Huiyu Duan, Guangtao Zhai
https://arxiv.org/abs/2510.08928 h…
Beyond Real Data: Synthetic Data through the Lens of Regularization
Amitis Shidani, Tyler Farghly, Yang Sun, Habib Ganjgahi, George Deligiannidis
https://arxiv.org/abs/2510.08095
LLMs are All You Need? Improving Fuzz Testing for MOJO with Large Language Models
Linghan Huang, Peizhou Zhao, Huaming Chen
https://arxiv.org/abs/2510.10179 https://
Is it Gaussian? Testing bosonic quantum states
Filippo Girardi, Freek Witteveen, Francesco Anna Mele, Lennart Bittel, Salvatore F. E. Oliviero, David Gross, Michael Walter
https://arxiv.org/abs/2510.07305
Particles with precessing spin in Kerr spacetime: analytic solutions for eccentric orbits and homoclinic motion near the equatorial plane
Gabriel Andres Piovano
https://arxiv.org/abs/2510.09597
Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski
https://arxiv.org/abs/2510.07257 h…
Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration
Mohanakrishnan Hariharan, Satish Arvapalli, Seshu Barma, Evangeline Sheela
https://arxiv.org/abs/2510.10824
AutoDAN-Reasoning: Enhancing Strategies Exploration based Jailbreak Attacks with Test-Time Scaling
Xiaogeng Liu, Chaowei Xiao
https://arxiv.org/abs/2510.05379 https://
Detection of mean changes in partially observed functional data
\v{S}\'arka Hudecov\'a, Claudia Kirch
https://arxiv.org/abs/2510.07854 https://arxi…
Scalable Offline Metrics for Autonomous Driving
Animikh Aich, Adwait Kulkarni, Eshed Ohn-Bar
https://arxiv.org/abs/2510.08571 https://arxiv.org/pdf/2510.08…
Agentic Property-Based Testing: Finding Bugs Across the Python Ecosystem
Muhammad Maaz, Liam DeVoe, Zac Hatfield-Dodds, Nicholas Carlini
https://arxiv.org/abs/2510.09907 https:/…
Extending CSST Emulator to post-DESI era
Zhao Chen, Yu Yu
https://arxiv.org/abs/2510.09503 https://arxiv.org/pdf/2510.09503
LATTA: Langevin-Anchored Test-Time Adaptation for Enhanced Robustness and Stability
Harshil Vejendla
https://arxiv.org/abs/2510.05530 https://arxiv.org/pdf…
High-Performance Imaging in a Dilution Refrigerator
Timo Eikelmann, Mara Brinkmann, Leonie Eggers, Tuncay Ulas, Donika Imeri, Konstantin Beck, Lasse Jens Irrgang, Sunil Kumar Mahato, Rikhav Shah, Ralf Riedinger
https://arxiv.org/abs/2510.07054
A Honest Cross-Validation Estimator for Prediction Performance
Tianyu Pan, Vincent Z. Yu, Viswanath Devanarayan, Lu Tian
https://arxiv.org/abs/2510.07649 https://
GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
Wen Ye, Zhaocheng Liu, Yuwei Gui, Tingyu Yuan, Yunyue Su, Bowen Fang, Chaoyang Zhao, Qiang Liu, Liang Wang
https://arxiv.org/abs/2510.07217
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models
Zhangyue Yin, Qiushi Sun, Zhiyuan Zeng, Zhiyuan Yu, Qipeng Guo, Xuanjing Huang, Xipeng Qiu
https://arxiv.org/abs/2510.06014
Effects of magnetic fields on spinning test particles orbiting Kerr-Bertotti-Robinson black holes
Yu-Kun Zhang, Shao-Wen Wei
https://arxiv.org/abs/2510.07914 https://
Proofs of No Intrusion
Vipul Goyal, Justin Raizes
https://arxiv.org/abs/2510.06432 https://arxiv.org/pdf/2510.06432…
Euclid preparation. Cosmology Likelihood for Observables in Euclid (CLOE). 4: Validation and Performance
Collaboration, Martinelli, Pezzotta, Sciotti, Blot, Bonici, Camera, Ca\~nas-Herrera, Cardone, Carrilho, Casas, Davini, Di Domizio, Farrens, Goh, Beauchamps, Ili\'c, Joudaki, Keil, Le Brun, Moretti, Pettorino, S\'anchez, Sakr, Tanidis, Tutusaus, Ajani, Crocce, Giocoli, Legrand, Lembo, Lesci, Girones, Nouri-Zonoz, Pamuk, Tsedrik, Bel, Carbone, Duncan, Kilbinger, Lacasa, Lattan…
Extension of Wald-Wolfowitz Runs Test for Regression Validity Testing with Repeated Measures of Independent Variable
Bo-Yao Lian, Nelson G. Chen
https://arxiv.org/abs/2510.05861
The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language Models
Konrad L\"ohr, Shuzhou Yuan, Michael F\"arber
https://arxiv.org/abs/2510.08236
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Weihao Zeng, Keqing He, Chuqiao Kuang, Xiaoguang Li, Junxian He
https://arxiv.org/abs/2510.06135 htt…
TTRV: Test-Time Reinforcement Learning for Vision Language Models
Akshit Singh, Shyam Marjit, Wei Lin, Paul Gavrikov, Serena Yeung-Levy, Hilde Kuehne, Rogerio Feris, Sivan Doveh, James Glass, M. Jehanzeb Mirza
https://arxiv.org/abs/2510.06783
A new composite Mann-Whitney test for two-sample survival comparisons with right-censored data
Abid Hussain, Touqeer Ahmad
https://arxiv.org/abs/2510.05353 https://
ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
Qin Liu, Jacob Dineen, Yuxi Huang, Sheng Zhang, Hoifung Poon, Ben Zhou, Muhao Chen
https://arxiv.org/abs/2510.08569
NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering
Alexander Murphy, Michal Danilowski, Soumyajit Chatterjee, Abhirup Ghosh
https://arxiv.org/abs/2510.05635 h…
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Jiaru Zou, Soumya Roy, Vinay Kumar Verma, Ziyi Wang, David Wipf, Pan Lu, Sumit Negi, James Zou, Jingrui He
https://arxiv.org/abs/2510.06217
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
Omri Uzan, Asaf Yehudai, Roi pony, Eyal Shnarch, Ariel Gera
https://arxiv.org/abs/2510.05038 htt…
Test-Time Defense Against Adversarial Attacks via Stochastic Resonance of Latent Ensembles
Dong Lao, Yuxiang Zhang, Haniyeh Ehsani Oskouie, Yangchao Wu, Alex Wong, Stefano Soatto
https://arxiv.org/abs/2510.03224
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Yi Lu, Jianing Wang, Linsen Guo, Wei He, Hongyin Tang, Tao Gui, Xuanjing Huang, Xuezhi Cao, Wei Wang, Xunliang Cai
https://arxiv.org/abs/2510.08189
Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime
Andreas Maurer, Erfan Mirzaei, Massimiliano Pontil
https://arxiv.org/abs/2510.06028 https…
Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR
Zeyu Sun, Jingjing Liang, Weiyi Wang, Chenyao Suo, Junjie Chen, Fanjiang Xu
https://arxiv.org/abs/2510.07815
PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs
Manuel Frank, Haithem Afli
https://arxiv.org/abs/2510.06730 https://
MatheMagic: Generating Dynamic Mathematics Benchmarks Robust to Memorization
Dayy\'an O'Brien, Barry Haddow, Emily Allaway, Pinzhen Chen
https://arxiv.org/abs/2510.05962
Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts
Jihoon Lee, Hoyeon Moon, Kevin Zhai, Arun Kumar Chithanar, Anit Kumar Sahu, Soummya Kar, Chul Lee, Souradip Chakraborty, Amrit Singh Bedi
https://arxiv.org/abs/2510.05040
Self-Reflective Generation at Test Time
Jian Mu, Qixin Zhang, Zhiyong Wang, Menglin Yang, Shuang Qiu, Chengwei Qin, Zhongxiang Dai, Yao Shu
https://arxiv.org/abs/2510.02919 http…
Test Case Generation from Bug Reports via Large Language Models: A Cognitive Layered Evaluation Framework
Irtaza Sajid Qureshi (Jack), Zhen Ming (Jack), Jiang
https://arxiv.org/abs/2510.05365
Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning
Wenxun Wu, Yuanyang Li, Guhan Chen, Linyue Wang, Hongyang Chen
https://arxiv.org/abs/2510.07038
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
Nevan Wichers, Aram Ebtekar, Ariana Azarbal, Victor Gillioz, Christine Ye, Emil Ryd, Neil Rathi, Henry Sleight, Alex Mallen, Fabien Roger, Samuel Marks
https://arxiv.org/abs/2510.05024
Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models
Runchu Tian, Junxia Cui, Xueqiang Xu, Feng Yao, Jingbo Shang
https://arxiv.org/abs/2510.05090
On the Role of Temperature Sampling in Test-Time Scaling
Yuheng Wu, Azalia Mirhoseini, Thierry Tambe
https://arxiv.org/abs/2510.02611 https://arxiv.org/pdf…
UnitTenX: Generating Tests for Legacy Packages with AI Agents Powered by Formal Verification
Yiannis Charalambous, Claudionor N. Coelho Jr, Luis Lamb, Lucas C. Cordeiro
https://arxiv.org/abs/2510.05441
Large Language Model-Based Uncertainty-Adjusted Label Extraction for Artificial Intelligence Model Development in Upper Extremity Radiography
Hanna Kreutzer, Anne-Sophie Caselitz, Thomas Dratsch, Daniel Pinto dos Santos, Christiane Kuhl, Daniel Truhn, Sven Nebelung
https://arxiv.org/abs/2510.05664 …