Tootfinder

Opt-in global Mastodon full text search. Join the index!

@arXiv_hepth_bot@mastoxiv.page
2025-09-18 08:16:41

Thermodynamic Split Conjecture and an Observational Test for Cosmological Entropy
Oem Trivedi
arxiv.org/abs/2509.13689 arxiv.org/pdf/2509.1…

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:40:51

BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Tomas Ruiz, Siyao Peng, Barbara Plank, Carsten Schwemmer
arxiv.org/abs/2510.12516

@arXiv_csSE_bot@mastoxiv.page
2025-09-18 08:23:21

An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software
Sina Gogani-Khiabani (University of Illinois Chicago), Ashutosh Trivedi (University of Colorado Boulder), Diptikalyan Saha (IBM Research), Saeid Tizpaz-Niari (University of Illinois Chicago)
arxiv.org/abs/2509.13471

@arXiv_csHC_bot@mastoxiv.page
2025-10-15 10:02:51

Data-Model Co-Evolution: Growing Test Sets to Refine LLM Behavior
Minjae Lee, Minsuk Kahng
arxiv.org/abs/2510.12728 arxiv.org/pdf/2510.1272…

@arXiv_econEM_bot@mastoxiv.page
2025-09-18 07:38:11

Generalized Covariance Estimator under Misspecification and Constraints
Aryan Manafi Neyazi
arxiv.org/abs/2509.13492 arxiv.org/pdf/2509.134…

@arXiv_statME_bot@mastoxiv.page
2025-10-15 08:55:31

A Martingale Kernel Two-Sample Test
Anirban Chatterjee, Aaditya Ramdas
arxiv.org/abs/2510.11853 arxiv.org/pdf/2510.11853

@arXiv_csLG_bot@mastoxiv.page
2025-10-15 10:45:41

Learning-To-Measure: In-context Active Feature Acquisition
Yuta Kobayashi, Zilin Jing, Jiayu Yao, Hongseok Namkoong, Shalmali Joshi
arxiv.org/abs/2510.12624

@arXiv_qfinST_bot@mastoxiv.page
2025-09-18 08:07:01

Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus
Lamia Lamrani, Beno\^it Collins, Jean-Philippe Bouchaud
arxiv.org/abs/2509.13923

@arXiv_csCV_bot@mastoxiv.page
2025-10-15 10:54:11

Efficient Real-World Deblurring using Single Images: AIM 2025 Challenge Report
Daniel Feijoo, Paula Garrido-Mellado, Marcos V. Conde, Jaesung Rim, Alvaro Garcia, Sunghyun Cho, Radu Timofte
arxiv.org/abs/2510.12788

@arXiv_physicsgeoph_bot@mastoxiv.page
2025-12-16 10:12:52

Correcting exponentiality test for binned earthquake magnitudes
Angela Stallone, Ilaria Spassiani
arxiv.org/abs/2512.13599 arxiv.org/pdf/25…

@arXiv_astrophHE_bot@mastoxiv.page
2025-10-15 09:14:01

DIPLODOCUS II: Implementation of transport equations and test cases relevant to micro-scale physics of jetted astrophysical sources
Christopher N. Everett, Marc Klinger-Plaisier, Garret Cotter
arxiv.org/abs/2510.12505

@arXiv_csCR_bot@mastoxiv.page
2025-10-14 11:48:48

ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
arxiv.org/abs/2510.10281

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 10:08:10

Titans Revisited: A Lightweight Reimplementation and Critical Analysis of a Test-Time Memory Model
Gavriel Di Nepi, Federico Siciliano, Fabrizio Silvestri
arxiv.org/abs/2510.09551

@arXiv_csCL_bot@mastoxiv.page
2025-10-15 10:38:31

Resource-sensitive but language-blind: Community size and not grammatical complexity better predicts the accuracy of Large Language Models in a novel Wug Test
Nikoleta Pantelidou, Evelina Leivada, Paolo Morosi
arxiv.org/abs/2510.12463

@arXiv_mathFA_bot@mastoxiv.page
2025-10-15 09:49:21

On Korovkin-type theorems including exponential test functions on infinite intervals through power series convergence
Dilek S\"oylemez, Mehmet \"Unver
arxiv.org/abs/2510.12568

@arXiv_astrophEP_bot@mastoxiv.page
2025-10-15 08:56:02

The resilience of the sailboat stable region
Rafael Sfair, Tiago F. L. L. Pinheiro, Giovana Ramon, Ernesto Vieira
arxiv.org/abs/2510.11855

@arXiv_astrophCO_bot@mastoxiv.page
2025-10-15 09:37:11

Hierarchical summaries for primordial non-Gaussianities
M. S. Cagliari, A. Bairagi, B. Wandelt
arxiv.org/abs/2510.12715 arxiv.org/pdf/2510.…

@arXiv_mathRT_bot@mastoxiv.page
2025-10-15 08:45:52

Unitary representations attached to parabolic subgroups: the case of abelian unipotent radical
Dan Ciubotaru
arxiv.org/abs/2510.11862 arxiv…

@publicvoit@graz.social
2025-12-13 23:14:30

If you're using #lazyblorg as your static website generator: I've updated the project today.
It now used "uv" for dependency management, script invocation and unit test execution. Furthermore, I adapted the code to match the #pandoc version of Debian 13 Trixie.
Although you ne…

@arXiv_condmatstrel_bot@mastoxiv.page
2025-10-15 09:15:31

Quantum criticality at the end of a pseudogap phase in superconducting infinite-layer nickelates
C. Iorio-Duval, E. Beauchesne-Blanchet, F. Perreault, J. L. Santana Gonz\'alez, W. Sun, Y. F. Nie, A. Gourgout, G. Grissonnanche
arxiv.org/abs/2510.12786

@arXiv_csHC_bot@mastoxiv.page
2025-10-15 09:56:52

Gauging the Competition: Understanding Social Comparison and Anxiety through Eye-tracking in Virtual Reality Group Interview
Shi-Ting Ni, Kairong Fang, Yuyang Wang, Pan Hui
arxiv.org/abs/2510.12590

@arXiv_csCY_bot@mastoxiv.page
2025-10-13 09:09:10

Non-traditional data in pandemic preparedness and response: identifying and addressing first and last-mile challenges
Mattia Mazzoli, Irma Varela-Lasheras, Sonia Namorado, Constantino Pereira Caetano, Andreia Leite, Lisa Hermans, Niel Hens, Polen T\"urkmen, Kyriaki Kalimeri, Leo Ferres, Ciro Cattuto, Daniela Paolotti, Stefaan Verhulst

@arXiv_csRO_bot@mastoxiv.page
2025-10-13 10:04:10

Bridging Research and Practice in Simulation-based Testing of Industrial Robot Navigation Systems
Sajad Khatiri, Francisco Eli Vina Barrientos, Maximilian Wulf, Paolo Tonella, Sebastiano Panichella
arxiv.org/abs/2510.09396

@arXiv_econTH_bot@mastoxiv.page
2025-10-15 08:33:02

Selection Procedures in Competitive Admission
Nathan Hancart
arxiv.org/abs/2510.12653 arxiv.org/pdf/2510.12653

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 13:41:38

Representation-Based Exploration for Language Models: From Test-Time to Post-Training
Jens Tuyls, Dylan J. Foster, Akshay Krishnamurthy, Jordan T. Ash
arxiv.org/abs/2510.11686

@arXiv_csCV_bot@mastoxiv.page
2025-10-13 10:35:30

D-TPT: Dimensional Entropy Maximization for Calibrating Test-Time Prompt Tuning in Vision-Language Models
Jisu Han, Wonjun Hwang
arxiv.org/abs/2510.09473

@arXiv_astrophIM_bot@mastoxiv.page
2025-10-14 09:39:38

The Importance of Being Adaptable: An Exploration of the Power and Limitations of Domain Adaptation for Simulation-Based Inference with Galaxy Clusters
Michelle Ntampaka, A. Ciprijanovic, Ana Maria Delgado, John Soltis, John F. Wu, Mikaeel Yunus, John ZuHone
arxiv.org/abs/2510.09748

@arXiv_mathPR_bot@mastoxiv.page
2025-10-14 10:49:18

General mean-field BSDEs with integrable terminal values
Weimin Jiang, Juan Li, Yan Shen
arxiv.org/abs/2510.11067 arxiv.org/pdf/2510.11067

@arXiv_econGN_bot@mastoxiv.page
2025-10-15 07:46:31

Beyond Test Scores: How Academic Rank Shapes Long-Term Outcomes
Emilia Del Bono, Angus Holford, Tommaso Sartori
arxiv.org/abs/2510.11973 ar…

@arXiv_statME_bot@mastoxiv.page
2025-10-14 11:16:39

A Kolmogorov-Smirnov-Type Test for Dependently Double-Truncated Data
Anne-Marie Toparkus, Rafael Weissbach
arxiv.org/abs/2510.11517 arxiv.o…

@arXiv_csAI_bot@mastoxiv.page
2025-10-14 17:28:38

Crosslisted article(s) found for cs.AI. arxiv.org/list/cs.AI/new
[8/17]:
- MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
Chen, Lei, Zhang, Ke, Zhu, Chen, Lu, Huang, Feng, He, Sun, Wu, Wang

@knurd42@social.linux.pizza
2025-10-12 09:45:59

PSA for users that regularly test #Fedora Beta as well as proposed updates once the new version was released:
Do not enable updates-testing[1] by modifying /etc/yum.repos.d/fedora-updates-testing.repo; instead do it like this:
$ sudo dnf config-manager setopt updates-testing.enabled=true
Otherwise updates-testing will be disabled shortly before the release of a new version (t…

Screenshot from the top of the linked page
@arXiv_grqc_bot@mastoxiv.page
2025-10-13 08:59:20

Chaos of charged particles near a renormalized group improved Kerr black hole in an external magnetic field
Junjie Lu, Xin Wu
arxiv.org/abs/2510.08954

@arXiv_csSE_bot@mastoxiv.page
2025-10-13 09:20:10

Search-based Hyperparameter Tuning for Python Unit Test Generation
Stephan Lukasczyk, Gordon Fraser
arxiv.org/abs/2510.08716 arxiv.org/pdf/…

@arXiv_astrophHE_bot@mastoxiv.page
2025-10-15 09:18:11

The double neutron star PSR J1946 2052 I. Masses and tests of general relativity
Lingqi Meng, Paulo C. C. Freire, Kevin Stovall, Norbert Wex, Xueli Miao, Weiwei Zhu, Michael Kramer, James M. Cordes, Huanchen Hu, Jinchen Jiang, Emilie Parent, Lijing Shao, Ingrid H. Stairs, Mengyao Xue, Adam Brazier, Fernando Camilo, David J. Champion, Shami Chatterjee, Fronefield Crawford, Ziyao Fang, Qiuyang Fu, Yanjun Guo, Jason W. T. Hessels, Maura MacLaughlin, Chenchen Miao, Jiarui Niu, Ziwei Wu, Ju…

@arXiv_nuclex_bot@mastoxiv.page
2025-10-15 08:15:32

The mass of $^{101}$Sn and Bayesian extrapolations to the proton drip line
Christian M. Ireland, Georg Bollen, Scott E. Campbell, Xiangcheng Chen, Hannah Erington, Nadeesha D. Gamage, Kyle Godbey, Alicen M. Houff, Christopher Izzo, Bailey Knight, Sudhanva Lalit, Erich Leistenschneider, E. Marilena Lykiardopoulou, Franziska M. Maier, Witold Nazarewicz, Rodney Orford, William S. Porter, Caleb Quick, Ante Ravlic, Matthew Redshaw, Paul-Gerhard Reinhard, Ryan Ringle, Stefan Schwarz, Chandan…

@arXiv_csCR_bot@mastoxiv.page
2025-10-14 12:12:18

CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense
Yang Zhuochen, Fok Kar Wai, Thing Vrizlynn
arxiv.org/abs/2510.11137 arx…

@arXiv_statML_bot@mastoxiv.page
2025-10-10 09:37:19

PAC Learnability in the Presence of Performativity
Ivan Kirev, Lyuben Baltadzhiev, Nikola Konstantinov
arxiv.org/abs/2510.08335 arxiv.org/p…

@arXiv_astrophSR_bot@mastoxiv.page
2025-10-13 08:02:50

How precisely can we measure the ages of subgiant and giant stars?
Cheyanne Shariat, Kareem El-Badry, Soumyadeep Bhattacharjee
arxiv.org/abs/2510.08675

@arXiv_mathST_bot@mastoxiv.page
2025-10-14 08:24:58

Simultaneous Frequentist Calibration of Confidence Regions for Multiple Functionals in Constrained Inverse Problems
Pau Batlle, Pratik Patil, Michael Stanley, Javier Ruiz Lupon, Houman Owhadi, Mikael Kuusela
arxiv.org/abs/2510.11708

@arXiv_quantph_bot@mastoxiv.page
2025-10-10 11:19:49

Guess your neighbor's input: Quantum advantage in Feige's game
Simon Schmidt, Sigurd A. L. Storgaard, Michael Walter, Yuming Zhao
arxiv.org/abs/2510.08484

@arXiv_astrophCO_bot@mastoxiv.page
2025-10-14 09:56:18

Fast radio bursts shed light on direct gravity test on cosmological scales
Shuren Zhou, Pengjie Zhang
arxiv.org/abs/2510.11022 arxiv.org/pd…

@arXiv_csLG_bot@mastoxiv.page
2025-10-14 22:19:32

Replaced article(s) found for cs.LG. arxiv.org/list/cs.LG/new
[13/14]:
- Class-Invariant Test-Time Augmentation for Domain Generalization
Zhicheng Lin, Xiaolin Wu, Xi Zhang

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 16:14:50

Crosslisted article(s) found for cs.CV. arxiv.org/list/cs.CV/new
[2/3]:
- ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh

@arXiv_csCL_bot@mastoxiv.page
2025-10-13 10:43:40

Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation
Sondos Mahmoud Bsharat, Zhiqiang Shen
arxiv.org/abs/2510.09599 arxi…

@arXiv_csSE_bot@mastoxiv.page
2025-10-13 09:55:00

Constraint-Guided Unit Test Generation for Machine Learning Libraries
Lukas Krodinger, Altin Hajdari, Stephan Lukasczyk, Gordon Fraser
arxiv.org/abs/2510.09108

@arXiv_astrophEP_bot@mastoxiv.page
2025-10-13 08:53:40

Probing the geological setting of exoplanets through atmospheric analysis: using Mars as a test case
Monica Rainer, Evandro Balbi, Francesco Borsa, Paola Cianfarra, Avet Harutyunyan, Silvano Tosi
arxiv.org/abs/2510.09305

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 10:11:10

LiveOIBench: Can Large Language Models Outperform Human Contestants in Informatics Olympiads?
Kaijian Zou, Aaron Xiong, Yunxiang Zhang, Frederick Zhang, Yueqi Ren, Jirong Yang, Ayoung Lee, Shitanshu Bhushan, Lu Wang
arxiv.org/abs/2510.09595

@arXiv_csCV_bot@mastoxiv.page
2025-10-14 13:45:18

Benchmarking foundation models for hyperspectral image classification: Application to cereal crop type mapping
Walid Elbarz, Mohamed Bourriz, Hicham Hajji, Hamd Ait Abdelali, Fran\c{c}ois Bourzeix
arxiv.org/abs/2510.11576

@arXiv_grqc_bot@mastoxiv.page
2025-10-14 08:21:08

The Gravitational Wave Memory from Binary Neutron Star Mergers
Jamie Bamber, Antonios Tsokaros, Milton Ruiz, Stuart L. Shapiro, Marc Favata, Matthew Karlson, Fabrizio Venturi Pi\~nas
arxiv.org/abs/2510.09742

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 13:18:28

When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents
Lingfei Qian, Xueqing Peng, Yan Wang, Vincent Jim Zhang, Huan He, Hanley Smith, Yi Han, Yueru He, Haohang Li, Yupeng Cao, Yangyang Yu, Alejandro Lopez-Lira, Peng Lu, Jian-Yun Nie, Guojun Xiong, Jimin Huang, Sophia Ananiadou
arxiv.org/abs/2510.11695

@arXiv_astrophCO_bot@mastoxiv.page
2025-10-14 10:41:19

Probing cosmic curvature with Alcock-Paczynski data
Yungui Gong, Qing Gao, Xuchen Lu, Zhu Yi
arxiv.org/abs/2510.11555 arxiv.org/pdf/2510.11…

@arXiv_csRO_bot@mastoxiv.page
2025-10-08 10:05:09

Verifier-free Test-Time Sampling for Vision Language Action Models
Suhyeok Jang, Dongyoung Kim, Changyeon Kim, Youngsuk Kim, Jinwoo Shin
arxiv.org/abs/2510.05681

@arXiv_csSE_bot@mastoxiv.page
2025-10-14 10:33:58

How Students Use Generative AI for Software Testing: An Observational Study
Baris Ardic, Quentin Le Dilavrec, Andy Zaidman
arxiv.org/abs/2510.10551

@arXiv_astrophHE_bot@mastoxiv.page
2025-10-14 10:17:48

Accretion onto Reissner-Nordstr\"{o}m naked singularities
Tomasz Krajewski, W{\l}odek Klu\'zniak
arxiv.org/abs/2510.10043 arxiv.or…

@arXiv_csCL_bot@mastoxiv.page
2025-10-14 21:37:08

Replaced article(s) found for cs.CL. arxiv.org/list/cs.CL/new
[2/9]:
- Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei

@arXiv_csAI_bot@mastoxiv.page
2025-10-13 09:20:30

LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition
Yushuo Zheng, Zicheng Zhang, Xiongkuo Min, Huiyu Duan, Guangtao Zhai
arxiv.org/abs/2510.08928

@arXiv_statML_bot@mastoxiv.page
2025-10-10 09:26:09

Beyond Real Data: Synthetic Data through the Lens of Regularization
Amitis Shidani, Tyler Farghly, Yang Sun, Habib Ganjgahi, George Deligiannidis
arxiv.org/abs/2510.08095

@arXiv_csSE_bot@mastoxiv.page
2025-10-14 09:59:28

LLMs are All You Need? Improving Fuzz Testing for MOJO with Large Language Models
Linghan Huang, Peizhou Zhao, Huaming Chen
arxiv.org/abs/2510.10179

@arXiv_quantph_bot@mastoxiv.page
2025-10-09 10:58:01

Is it Gaussian? Testing bosonic quantum states
Filippo Girardi, Freek Witteveen, Francesco Anna Mele, Lennart Bittel, Salvatore F. E. Oliviero, David Gross, Michael Walter
arxiv.org/abs/2510.07305

@arXiv_grqc_bot@mastoxiv.page
2025-10-13 09:42:30

Particles with precessing spin in Kerr spacetime: analytic solutions for eccentric orbits and homoclinic motion near the equatorial plane
Gabriel Andres Piovano
arxiv.org/abs/2510.09597

@arXiv_csLG_bot@mastoxiv.page
2025-10-09 10:55:11

Test-Time Graph Search for Goal-Conditioned Reinforcement Learning
Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski
arxiv.org/abs/2510.07257

@arXiv_csSE_bot@mastoxiv.page
2025-10-14 10:41:48

Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration
Mohanakrishnan Hariharan, Satish Arvapalli, Seshu Barma, Evangeline Sheela
arxiv.org/abs/2510.10824

@arXiv_csCR_bot@mastoxiv.page
2025-10-08 09:18:49

AutoDAN-Reasoning: Enhancing Strategies Exploration based Jailbreak Attacks with Test-Time Scaling
Xiaogeng Liu, Chaowei Xiao
arxiv.org/abs/2510.05379

@arXiv_statME_bot@mastoxiv.page
2025-10-10 09:05:49

Detection of mean changes in partially observed functional data
\v{S}\'arka Hudecov\'a, Claudia Kirch
arxiv.org/abs/2510.07854 arxi…

@arXiv_csRO_bot@mastoxiv.page
2025-10-10 10:20:59

Scalable Offline Metrics for Autonomous Driving
Animikh Aich, Adwait Kulkarni, Eshed Ohn-Bar
arxiv.org/abs/2510.08571 arxiv.org/pdf/2510.08…

@arXiv_csSE_bot@mastoxiv.page
2025-10-14 08:07:37

Agentic Property-Based Testing: Finding Bugs Across the Python Ecosystem
Muhammad Maaz, Liam DeVoe, Zac Hatfield-Dodds, Nicholas Carlini
arxiv.org/abs/2510.09907

@arXiv_astrophCO_bot@mastoxiv.page
2025-10-13 09:48:30

Extending CSST Emulator to post-DESI era
Zhao Chen, Yu Yu
arxiv.org/abs/2510.09503 arxiv.org/pdf/2510.09503

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:40:39

LATTA: Langevin-Anchored Test-Time Adaptation for Enhanced Robustness and Stability
Harshil Vejendla
arxiv.org/abs/2510.05530 arxiv.org/pdf…

@arXiv_quantph_bot@mastoxiv.page
2025-10-09 10:39:41

High-Performance Imaging in a Dilution Refrigerator
Timo Eikelmann, Mara Brinkmann, Leonie Eggers, Tuncay Ulas, Donika Imeri, Konstantin Beck, Lasse Jens Irrgang, Sunil Kumar Mahato, Rikhav Shah, Ralf Riedinger
arxiv.org/abs/2510.07054

@arXiv_statML_bot@mastoxiv.page
2025-10-10 08:29:48

A Honest Cross-Validation Estimator for Prediction Performance
Tianyu Pan, Vincent Z. Yu, Viswanath Devanarayan, Lu Tian
arxiv.org/abs/2510.07649

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:47:01

GenPilot: A Multi-Agent System for Test-Time Prompt Optimization in Image Generation
Wen Ye, Zhaocheng Liu, Yuwei Gui, Tingyu Yuan, Yunyue Su, Bowen Fang, Chaoyang Zhao, Qiang Liu, Liang Wang
arxiv.org/abs/2510.07217

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 10:28:19

ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models
Zhangyue Yin, Qiushi Sun, Zhiyuan Zeng, Zhiyuan Yu, Qipeng Guo, Xuanjing Huang, Xipeng Qiu
arxiv.org/abs/2510.06014

@arXiv_grqc_bot@mastoxiv.page
2025-10-10 09:47:09

Effects of magnetic fields on spinning test particles orbiting Kerr-Bertotti-Robinson black holes
Yu-Kun Zhang, Shao-Wen Wei
arxiv.org/abs/2510.07914

@arXiv_csCR_bot@mastoxiv.page
2025-10-09 08:57:21

Proofs of No Intrusion
Vipul Goyal, Justin Raizes
arxiv.org/abs/2510.06432 arxiv.org/pdf/2510.06432

@arXiv_astrophCO_bot@mastoxiv.page
2025-10-13 09:28:10

Euclid preparation. Cosmology Likelihood for Observables in Euclid (CLOE). 4: Validation and Performance
Collaboration, Martinelli, Pezzotta, Sciotti, Blot, Bonici, Camera, Ca\~nas-Herrera, Cardone, Carrilho, Casas, Davini, Di Domizio, Farrens, Goh, Beauchamps, Ili\'c, Joudaki, Keil, Le Brun, Moretti, Pettorino, S\'anchez, Sakr, Tanidis, Tutusaus, Ajani, Crocce, Giocoli, Legrand, Lembo, Lesci, Girones, Nouri-Zonoz, Pamuk, Tsedrik, Bel, Carbone, Duncan, Kilbinger, Lacasa, Lattan…

@arXiv_statME_bot@mastoxiv.page
2025-10-08 09:30:39

Extension of Wald-Wolfowitz Runs Test for Regression Validity Testing with Repeated Measures of Independent Variable
Bo-Yao Lian, Nelson G. Chen
arxiv.org/abs/2510.05861

@arXiv_csLG_bot@mastoxiv.page
2025-10-10 11:04:09

The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language Models
Konrad L\"ohr, Shuzhou Yuan, Michael F\"arber
arxiv.org/abs/2510.08236

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 10:34:39

Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Weihao Zeng, Keqing He, Chuqiao Kuang, Xiaoguang Li, Junxian He
arxiv.org/abs/2510.06135

@arXiv_csCV_bot@mastoxiv.page
2025-10-09 10:26:11

TTRV: Test-Time Reinforcement Learning for Vision Language Models
Akshit Singh, Shyam Marjit, Wei Lin, Paul Gavrikov, Serena Yeung-Levy, Hilde Kuehne, Rogerio Feris, Sivan Doveh, James Glass, M. Jehanzeb Mirza
arxiv.org/abs/2510.06783

@arXiv_statME_bot@mastoxiv.page
2025-10-08 08:49:19

A new composite Mann-Whitney test for two-sample survival comparisons with right-censored data
Abid Hussain, Touqeer Ahmad
arxiv.org/abs/2510.05353

@arXiv_csCL_bot@mastoxiv.page
2025-10-10 11:10:59

ArenaBencher: Automatic Benchmark Evolution via Multi-Model Competitive Evaluation
Qin Liu, Jacob Dineen, Yuxi Huang, Sheng Zhang, Hoifung Poon, Ben Zhou, Muhao Chen
arxiv.org/abs/2510.08569

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:45:59

NEO: No-Optimization Test-Time Adaptation through Latent Re-Centering
Alexander Murphy, Michal Danilowski, Soumyajit Chatterjee, Abhirup Ghosh
arxiv.org/abs/2510.05635

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 10:37:39

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Jiaru Zou, Soumya Roy, Vinay Kumar Verma, Ziyi Wang, David Wipf, Pan Lu, Sumit Negi, James Zou, Jingrui He
arxiv.org/abs/2510.06217

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:20:52

Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization
Omri Uzan, Asaf Yehudai, Roi pony, Eyal Shnarch, Ariel Gera
arxiv.org/abs/2510.05038

@arXiv_csCV_bot@mastoxiv.page
2025-10-06 10:14:19

Test-Time Defense Against Adversarial Attacks via Stochastic Resonance of Latent Ensembles
Dong Lao, Yuxiang Zhang, Haniyeh Ehsani Oskouie, Yangchao Wu, Alex Wong, Stefano Soatto
arxiv.org/abs/2510.03224

@arXiv_csAI_bot@mastoxiv.page
2025-10-10 10:31:19

R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Yi Lu, Jianing Wang, Linsen Guo, Wei He, Hongyin Tang, Tao Gui, Xuanjing Huang, Xuezhi Cao, Wei Wang, Xunliang Cai
arxiv.org/abs/2510.08189

@arXiv_csLG_bot@mastoxiv.page
2025-10-08 10:54:59

Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime
Andreas Maurer, Erfan Mirzaei, Massimiliano Pontil
arxiv.org/abs/2510.06028

@arXiv_csSE_bot@mastoxiv.page
2025-10-10 09:09:09

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR
Zeyu Sun, Jingjing Liang, Weiyi Wang, Chenyao Suo, Junjie Chen, Fanjiang Xu
arxiv.org/abs/2510.07815

@arXiv_csCL_bot@mastoxiv.page
2025-10-09 10:21:31

PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs
Manuel Frank, Haithem Afli
arxiv.org/abs/2510.06730

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 10:27:09

MatheMagic: Generating Dynamic Mathematics Benchmarks Robust to Memorization
Dayy\'an O'Brien, Barry Haddow, Emily Allaway, Pinzhen Chen
arxiv.org/abs/2510.05962

@arXiv_csLG_bot@mastoxiv.page
2025-10-07 13:06:22

Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts
Jihoon Lee, Hoyeon Moon, Kevin Zhai, Arun Kumar Chithanar, Anit Kumar Sahu, Soummya Kar, Chul Lee, Souradip Chakraborty, Amrit Singh Bedi
arxiv.org/abs/2510.05040

@arXiv_csCL_bot@mastoxiv.page
2025-10-06 10:18:59

Self-Reflective Generation at Test Time
Jian Mu, Qixin Zhang, Zhiyong Wang, Menglin Yang, Shuang Qiu, Chengwei Qin, Zhongxiang Dai, Yao Shu
arxiv.org/abs/2510.02919

@arXiv_csSE_bot@mastoxiv.page
2025-10-08 08:38:39

Test Case Generation from Bug Reports via Large Language Models: A Cognitive Layered Evaluation Framework
Irtaza Sajid Qureshi (Jack), Zhen Ming (Jack), Jiang
arxiv.org/abs/2510.05365

@arXiv_csAI_bot@mastoxiv.page
2025-10-09 09:58:01

Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning
Wenxun Wu, Yuanyang Li, Guhan Chen, Linyue Wang, Hongyang Chen
arxiv.org/abs/2510.07038

@arXiv_csLG_bot@mastoxiv.page
2025-10-07 13:06:02

Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
Nevan Wichers, Aram Ebtekar, Ariana Azarbal, Victor Gillioz, Christine Ye, Emil Ryd, Neil Rathi, Henry Sleight, Alex Mallen, Fabien Roger, Samuel Marks
arxiv.org/abs/2510.05024

@arXiv_csCL_bot@mastoxiv.page
2025-10-07 12:23:42

Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models
Runchu Tian, Junxia Cui, Xueqiang Xu, Feng Yao, Jingbo Shang
arxiv.org/abs/2510.05090

@arXiv_csAI_bot@mastoxiv.page
2025-10-06 08:41:39

On the Role of Temperature Sampling in Test-Time Scaling
Yuheng Wu, Azalia Mirhoseini, Thierry Tambe
arxiv.org/abs/2510.02611 arxiv.org/pdf…

@arXiv_csSE_bot@mastoxiv.page
2025-10-08 08:59:09

UnitTenX: Generating Tests for Legacy Packages with AI Agents Powered by Formal Verification
Yiannis Charalambous, Claudionor N. Coelho Jr, Luis Lamb, Lucas C. Cordeiro
arxiv.org/abs/2510.05441

@arXiv_csAI_bot@mastoxiv.page
2025-10-08 10:03:59

Large Language Model-Based Uncertainty-Adjusted Label Extraction for Artificial Intelligence Model Development in Upper Extremity Radiography
Hanna Kreutzer, Anne-Sophie Caselitz, Thomas Dratsch, Daniel Pinto dos Santos, Christiane Kuhl, Daniel Truhn, Sven Nebelung
arxiv.org/abs/2510.05664