
2025-09-19 10:36:41
TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference
Dan Zhang, Min Cai, Jonathan Li, Ziniu Hu, Yisong Yue, Yuxiao Dong, Jie Tang
https://arxiv.org/abs/2509.15110
TDRM: Smooth Reward Models with Temporal Difference for LLM RL and Inference
Dan Zhang, Min Cai, Jonathan Li, Ziniu Hu, Yisong Yue, Yuxiao Dong, Jie Tang
https://arxiv.org/abs/2509.15110
On small, local language models: 'In a world increasingly dominated by massive models and opaque APIs, we believe there’s still room for small, transparent, controllable systems. Models you can fine-tune, understand and run on your own terms' https://www.turing.ac.uk…
Can Large Models Teach Student Models to Solve Mathematical Problems Like Human Beings? A Reasoning Distillation Method via Multi-LoRA Interaction
Xinhe Li, Jiajun Liu, Peng Wang
https://arxiv.org/abs/2508.13037
OpenAI and Apollo Research trained o3 and o4-mini versions to not engage in "scheming", or secretly pursuing undesirable goals, reducing "covert actions" ~30X (Radhika Rajkumar/ZDNET)
https://www.zdnet.com/article/ai-models-kn
Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment
Ankur Samanta, Akshayaa Magesh, Youliang Yu, Runzhe Wu, Ayush Jain, Daniel Jiang, Boris Vidolov, Paul Sajda, Yonathan Efroni, Kaveh Hassani
https://arxiv.org/abs/2509.15172
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
Rui Shao, Wei Li, Lingsen Zhang, Renshan Zhang, Zhiyang Liu, Ran Chen, Liqiang Nie
https://arxiv.org/abs/2508.13073
Early Approaches to Adversarial Fine-Tuning for Prompt Injection Defense: A 2022 Study of GPT-3 and Contemporary Models
Gustavo Sandoval, Denys Fenchenko, Junyao Chen
https://arxiv.org/abs/2509.14271
Lepton models from non-holomorphic $A^{\prime}_{5}$ modular flavor symmetry
Cai-Chang Li, Gui-Jun Ding
https://arxiv.org/abs/2509.15183 https://arxiv.org/p…
Mind & Motion: Opportunities and Applications of Integrating Biomechanics and Cognitive Models in HCI
Arthur Fleig, Florian Fischer, Markus Klar, Patrick Ebel, Miroslav Bachinski, Per Ola Kristensson, Roderick Murray-Smith, Antti Oulasvirta
https://arxiv.org/abs/2508.13788
Calibration-Aware Prompt Learning for Medical Vision-Language Models
Abhishek Basu, Fahad Shamshad, Ashshak Sharifdeen, Karthik Nandakumar, Muhammad Haris Khan
https://arxiv.org/abs/2509.15226
Jš vi IA mentindo e omitindo... Só o que me surpreende é usarmos isso para qualquer coisa, quer dizer, sermos usados por isso para qualquer coisa.
@… https://mas.to/@carnage4life/115228191
On Instantons in Gross-Neveu and Gross-Neveu-Yukawa models
A. Imaanpur, S. E. Sadati
https://arxiv.org/abs/2508.12080 https://arxiv.org/pdf/2508.12080
Accelerating Edge Inference for Distributed MoE Models with Latency-Optimized Expert Placement
Tian Wu, Liming Wang, Zijian Wen, Xiaoxi Zhang, Jingpu Duan, Xianwei Zhang, Jinhang Zuo
https://arxiv.org/abs/2508.12851
Real-Time, Population-Based Reconstruction of 3D Bone Models via Very-Low-Dose Protocols
Yiqun Lin, Haoran Sun, Yongqing Li, Rabia Aslam, Lung Fung Tse, Tiange Cheng, Chun Sing Chui, Wing Fung Yau, Victorine R. Le Meur, Meruyert Amangeldy, Kiho Cho, Yinyu Ye, James Zou, Wei Zhao, Xiaomeng Li
https://arxiv.org/abs/2508.13947
What Matters in LLM-Based Feature Extractor for Recommender? A Systematic Analysis of Prompts, Models, and Adaptation
Kainan Shi (Xi'an Jiaotong University), Peilin Zhou (Hong Kong University of Science,Technology), Ge Wang (Xi'an Jiaotong University), Han Ding (Xi'an Jiaotong University), Fei Wang (Xi'an Jiaotong University)
https://
HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware
Yuannuo Feng, Wenyong Zhou, Yuexi Lyu, Hanjie Liu, Zhengwu Liu, Ngai Wong, Wang Kang
https://arxiv.org/abs/2508.11935
Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset
Zhipeng Xue, Xiaoting Zhang, Zhipeng Gao, Xing Hu, Shan Gao, Xin Xia, Shanping Li
https://arxiv.org/abs/2508.11958
Statistical Inference for Subgraph Frequencies of Exchangeable Hyperedge Models
Ayoushman Bhattacharya, Nilanjan Chakraborty, Robert Lunde
https://arxiv.org/abs/2508.13258 https…
Exploring Self-Supervised Audio Models for Generalized Anomalous Sound Detection
Bing Han, Anbai Jiang, Xinhu Zheng, Wei-Qiang Zhang, Jia Liu, Pingyi Fan, Yanmin Qian
https://arxiv.org/abs/2508.12230
In case you don't want any Microsoft AI to be trained ("improved") with your personal data and content from LinkedIn, you should probably toggle the according switch in your LinkedIn privacy settings to "Off". Default is set to "On".
#generativeAI #dataprivacy
From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models
Ziyan Kuang, Feiyu Zhu, Maowei Jiang, Yanzhao Lai, Zelin Wang, Zhitong Wang, Meikang Qiu, Jiajia Huang, Min Peng, Qianqian Xie, Sophia Ananiadou
https://arxiv.org/abs/2508.13491
Self-Improving Embodied Foundation Models
Seyed Kamyar Seyed Ghasemipour, Ayzaan Wahid, Jonathan Tompson, Pannag Sanketi, Igor Mordatch
https://arxiv.org/abs/2509.15155 https://…
Large Language Models in the Data Science Lifecycle: A Systematic Mapping Study
Sai Sanjna Chintakunta, Nathalia Nascimento, Everton Guimaraes
https://arxiv.org/abs/2508.11698 h…
Uncertainty-Aware PCA for Arbitrarily Distributed Data Modeled by Gaussian Mixture Models
Daniel Kl\"otzl, Ozan Tastekin, David H\"agele, Marina Evers, Daniel Weiskopf
https://arxiv.org/abs/2508.13990
HARNESS: Lightweight Distilled Arabic Speech Foundation Models
Vrunda N. sukhadia, Shammur Absar Chowdhury
https://arxiv.org/abs/2509.14689 https://arxiv.o…
GLAMs and humanities folk deserve better AI models, number 5654 in a series: 'the subjectivity of a task or instance negatively affects the performance of a model – an observation that is particularly worrying for humanities research'.
From Kaspar Beelen's report, Small Language Models for libraries and computational humanities
KI statt Models: Otto spart bei der Produktfotografie
Mit einem eigenen KI-Tool will Otto neue Kollektionen schneller und günstiger in den Verkauf bringen. Otto folgt einem allgemeinen Trend in der Branche.
https://www…
Sources: Apple is making all four iPhone 17 models in India ahead of debut, a first, as it expands to five factories to cut reliance on China for US-bound units (Bloomberg)
https://www.bloomberg.com/news/articles/2025-08-19/…
Playing telephone with generative models: "verification disability," "compelled reliance," and accessibility in data visualization
Frank Elavsky, Cindy Xiong Bearfield
https://arxiv.org/abs/2508.12192
RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation
Tianyi Niu, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal
https://arxiv.org/abs/2508.13968 ht…
ChronoLLM: Customizing Language Models for Physics-Based Simulation Code Generation
Jingquan Wang, Andrew Negrut, Harry Zhang, Khailanii Slaton, Shu Wang, Radu Serban, Jinlong Wu, Dan Negrut
https://arxiv.org/abs/2508.13975
Degenerate kinks and kink-instantons in two-dimensional scalar field models with $\mathcal{N}=1$ and $\mathcal{N}=2$ supersymmetry
Evgenii Ievlev, Mikhail Shifman
https://arxiv.org/abs/2509.14324
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Yujun Zhou, Zhenwen Liang, Haolin Liu, Wenhao Yu, Kishan Panaganti, Linfeng Song, Dian Yu, Xiangliang Zhang, Haitao Mi, Dong Yu
https://arxiv.org/abs/2509.15194
Beyond Data Privacy: New Privacy Risks for Large Language Models
Yuntao Du, Zitao Li, Ninghui Li, Bolin Ding
https://arxiv.org/abs/2509.14278 https://arxiv…
Stringy Constraints on Modular Flavor Models
Keiya Ishiguro, Takafumi Kai, Tatsuo Kobayashi, Hajime Otsuka
https://arxiv.org/abs/2508.12392 https://arxiv.o…
FOUNDER: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making
Yucen Wang, Rui Yu, Shenghua Wan, Le Gan, De-Chuan Zhan
https://arxiv.org/abs/2507.12496
Identification and Estimation of Multi-order Tensor Factor Models
Zetai Cen
https://arxiv.org/abs/2508.13418 https://arxiv.org/pdf/2508.13418
CodeLSI: Leveraging Foundation Models for Automated Code Generation with Low-Rank Optimization and Domain-Specific Instruction Tuning
Huy Le, Phong Nguyen, Hao Do, Tuan Nguyen, Thien Pham, Anh Nguyen-Duc, Tho Quan
https://arxiv.org/abs/2509.14373
Compressed Models are NOT Trust-equivalent to Their Large Counterparts
Rohit Raj Rai, Chirag Kothari, Siddhesh Shelke, Amit Awekar
https://arxiv.org/abs/2508.13533 https://
7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models
Elena Izzo, Luca Parolari, Davide Vezzaro, Lamberto Ballan
https://arxiv.org/abs/2508.12919 https://
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory
Ming Li, Nan Zhang, Chenrui Fan, Hong Jiao, Yanbin Fu, Sydney Peters, Qingshu Xu, Robert Lissitz, Tianyi Zhou
https://arxiv.org/abs/2509.14662
Towards a Larger Model via One-Shot Federated Learning on Heterogeneous Client Models
Wenxuan Ye, Xueli An, Onur Ayan, Junfan Wang, Xueqiang Yan, Georg Carle
https://arxiv.org/abs/2508.13625
Sources: Anthropic refused federal law enforcement requests to use its AI models for some tasks, such as the surveillance of US citizens, irking the White House (Reed Albergotti/Semafor)
https://www.semafor.com/article/09/17/2025
Word Meanings in Transformer Language Models
Jumbly Grindrod, Peter Grindrod
https://arxiv.org/abs/2508.12863 https://arxiv.org/pdf/2508.12863
CARGO: A Framework for Confidence-Aware Routing of Large Language Models
Amine Barrak, Yosr Fourati, Michael Olchawa, Emna Ksontini, Khalil Zoghlami
https://arxiv.org/abs/2509.14899
Consiglieres in the Shadow: Understanding the Use of Uncensored Large Language Models in Cybercrimes
Zilong Lin, Zichuan Li, Xiaojing Liao, XiaoFeng Wang
https://arxiv.org/abs/2508.12622
CAST: Counterfactual Labels Improve Instruction Following in Vision-Language-Action Models
Catherine Glossop, William Chen, Arjun Bhorkar, Dhruv Shah, Sergey Levine
https://arxiv.org/abs/2508.13446
VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models
Huanchen Wang, Wencheng Zhang, Zhiqiang Wang, Zhicong Lu, Yuxin Ma
https://arxiv.org/abs/2509.14571
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning
Yeongbin Seo, Dongha Lee, Jaehyung Kim, Jinyoung Yeo
https://arxiv.org/abs/2509.15188
Transplant-Ready? Evaluating AI Lung Segmentation Models in Candidates with Severe Lung Disease
Jisoo Lee, Michael R. Harowicz, Yuwen Chen, Hanxue Gu, Isaac S. Alderete, Lin Li, Maciej A. Mazurowski, Matthew G. Hartwig
https://arxiv.org/abs/2509.15083
Rationality Check! Benchmarking the Rationality of Large Language Models
Zhilun Zhou, Jing Yi Wang, Nicholas Sukiennik, Chen Gao, Fengli Xu, Yong Li, James Evans
https://arxiv.org/abs/2509.14546
Strengthening Programming Comprehension in Large Language Models through Code Generation
Xiaoning Ren, Qiang Hu, Wei Ma, Yan Li, Yao Zhang, Lingxiao Jiang, Yinxing Xue
https://arxiv.org/abs/2508.12620 …
Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting
Aarushi Mahajan, Wayne Burleson
https://arxiv.org/abs/2509.15170 https://
Evaluating Large Language Models for Cross-Lingual Retrieval
Longfei Zuo, Pingjun Hong, Oliver Kraus, Barbara Plank, Robert Litschko
https://arxiv.org/abs/2509.14749 https://
Designing Latent Safety Filters using Pre-Trained Vision Models
Ihab Tabbara, Yuxuan Yang, Ahmad Hamzeh, Maxwell Astafyev, Hussein Sibai
https://arxiv.org/abs/2509.14758 https:/…
Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models
Haobo Yang, Minghao Guo, Dequan Yang, Wenyu Wang
https://arxiv.org/abs/2509.15156 https://…
PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models
Pengcheng Huang, Shuhao Liu, Zhenghao Liu, Yukun Yan, Shuo Wang, Zulong Chen, Tong Xiao
https://arxiv.org/abs/2508.13021
Efficient Conformal Prediction for Regression Models under Label Noise
Yahav Cohen, Jacob Goldberger, Tom Tirer
https://arxiv.org/abs/2509.15120 https://ar…
Improving Detection of Watermarked Language Models
Dara Bahri, John Wieting
https://arxiv.org/abs/2508.13131 https://arxiv.org/pdf/2508.13131
ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models
Zhichen Lou, Kechun Xu, Zhongxiang Zhou, Rong Xiong
https://arxiv.org/abs/2508.11918 https://…
Evolution of Kernels: Automated RISC-V Kernel Optimization with Large Language Models
Siyuan Chen, Zhichao Lu, Qingfu Zhang
https://arxiv.org/abs/2509.14265 https://
Enhancing Targeted Adversarial Attacks on Large Vision-Language Models through Intermediate Projector Guidance
Yiming Cao, Yanjie Li, Kaisheng Liang, Yuni Lai, Bin Xiao
https://arxiv.org/abs/2508.13739
Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models
Wei Song, Haonan Zhong, Ziqi Ding, Jingling Xue, Yuekang Li
https://arxiv.org/abs/2508.12566 ht…
V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models
Qidong Wang, Junjie Hu, Ming Jiang
https://arxiv.org/abs/2509.14837
CrafterDojo: A Suite of Foundation Models for Building Open-Ended Embodied Agents in Crafter
Junyeong Park, Hyeonseo Cho, Sungjin Ahn
https://arxiv.org/abs/2508.13530 https://…
Automating Modelica Module Generation Using Large Language Models: A Case Study on Building Control Description Language
Hanlong Wan, Xing Lu, Yan Chen, Karthik Devaprasad, Laura Hinkle
https://arxiv.org/abs/2509.14623
Driving Style Recognition Like an Expert Using Semantic Privileged Information from Large Language Models
Zhaokun Chen, Chaopeng Zhang, Xiaohan Li, Wenshuo Wang, Gentiane Venture, Junqiang Xi
https://arxiv.org/abs/2508.13881
Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models
Jianshu Zeng, Yuxuan Liu, Yutong Feng, Chenxuan Miao, Zixiang Gao, Jiwang Qu, Jianzhang Zhang, Bin Wang, Kun Yuan
https://arxiv.org/abs/2508.12945
MME-SCI: A Comprehensive and Challenging Science Benchmark for Multimodal Large Language Models
Jiacheng Ruan, Dan Jiang, Xian Gao, Ting Liu, Yuzhuo Fu, Yangyang Kang
https://arxiv.org/abs/2508.13938
Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics
Benjamin Sterling, Yousef El-Laham, M\'onica F. Bugallo
https://arxiv.org/abs/2509.14225
Reinforced Context Order Recovery for Adaptive Reasoning and Planning
Long Ma, Fangwei Zhong, Yizhou Wang
https://arxiv.org/abs/2508.13070 https://arxiv.or…
A Study on Thinking Patterns of Large Reasoning Models in Code Generation
Kevin Halim, Sin G. Teo, Ruitao Feng, Zhenpeng Chen, Yang Gu, Chong Wang, Yang Liu
https://arxiv.org/abs/2509.13758
Ensemble of Pre-Trained Models for Long-Tailed Trajectory Prediction
Divya Thuremella, Yi Yang, Simon Wanna, Lars Kunze, Daniele De Martini
https://arxiv.org/abs/2509.13914 http…
G$^2$RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
Yongxin Guo, Wenbo Deng, Zhenglin Cheng, Xiaoying Tang
https://arxiv.org/abs/2508.13023 https://
Can Current AI Models Count What We Mean, Not What They See? A Benchmark and Systematic Evaluation
Gia Khanh Nguyen, Yifeng Huang, Minh Hoai
https://arxiv.org/abs/2509.13939 htt…
Can Large Language Models (LLMs) Describe Pictures Like Children? A Comparative Corpus Study
Hanna Woloszyn, Benjamin Gagl
https://arxiv.org/abs/2508.13769 https://
Ask Good Questions for Large Language Models
Qi Wu, Zhongqi Lu
https://arxiv.org/abs/2508.14025 https://arxiv.org/pdf/2508.14025
ViExam: Are Vision Language Models Better than Humans on Vietnamese Multimodal Exam Questions?
Vy Tuong Dang, An Vo, Quang Tau, Duc Dm, Daeyoung Kim
https://arxiv.org/abs/2508.13680
Leveraging Large Language Models for Predictive Analysis of Human Misery
Bishanka Seal, Rahul Seetharaman, Aman Bansal, Abhilash Nandy
https://arxiv.org/abs/2508.12669 https://
Fair-GPTQ: Bias-Aware Quantization for Large Language Models
Irina Proskurina, Guillaume Metzler, Julien Velcin
https://arxiv.org/abs/2509.15206 https://ar…
Generics and Default Reasoning in Large Language Models
James Ravi Kirkpatrick, Rachel Katharine Sterken
https://arxiv.org/abs/2508.13718 https://arxiv.org…
A Comparative Evaluation of Large Language Models for Persian Sentiment Analysis and Emotion Detection in Social Media Texts
Kian Tohidi, Kia Dashtipour, Simone Rebora, Sevda Pourfaramarz
https://arxiv.org/abs/2509.14922
Cross-Modal Knowledge Distillation for Speech Large Language Models
Enzhi Wang, Qicheng Li, Zhiyuan Tang, Yuhang Jia
https://arxiv.org/abs/2509.14930 https://
ALIGN: Word Association Learning for Cross-Cultural Generalization in Large Language Models
Chunhua Liu, Kabir Manandhar Shrestha, Sukai Huang
https://arxiv.org/abs/2508.13426 h…
LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models
Zhiyuan Ning, Tianle Gu, Jiaxin Song, Shixin Hong, Lingyu Li, Huacan Liu, Jie Li, Yixu Wang, Meng Lingyu, Yan Teng, Yingchun Wang
https://arxiv.org/abs/2508.12733
A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models
Jinyi Han, Xinyi Wang, Haiquan Zhao, Tingyun li, Zishang Jiang, Sihang Jiang, Jiaqing Liang, Xin Lin, Weikang Zhou, Zeye Sun, Fei Yu, Yanghua Xiao
https://arxiv.org/abs/2508.12903
MATA (m\=ata): Mindful Assessment of the Telugu Abilities of Large Language Models
Chalamalasetti Kranti, Sowmya Vajjala
https://arxiv.org/abs/2508.13526 https://
LNE-Blocking: An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models
Ruijie Hou, Yueyang Jiao, Hanxu Hu, Yingming Li, Wai Lam, Huajian Zhang, Hongyuan Lu
https://arxiv.org/abs/2509.15218
CLEAR: A Comprehensive Linguistic Evaluation of Argument Rewriting by Large Language Models
Thomas Huber, Christina Niklaus
https://arxiv.org/abs/2509.15027 https://
LLM-OREF: An Open Relation Extraction Framework Based on Large Language Models
Hongyao Tu, Liang Zhang, Yujie Lin, Xin Lin, Haibo Zhang, Long Zhang, Jinsong Su
https://arxiv.org/abs/2509.15089
SMARTER: A Data-efficient Framework to Improve Toxicity Detection with Explanation via Self-augmenting Large Language Models
Huy Nghiem, Advik Sachdeva, Hal Daum\'e III
https://arxiv.org/abs/2509.15174
Assessing Historical Structural Oppression Worldwide via Rule-Guided Prompting of Large Language Models
Sreejato Chatterjee, Linh Tran, Quoc Duy Nguyen, Roni Kirson, Drue Hamlin, Harvest Aquino, Hanjia Lyu, Jiebo Luo, Timothy Dye
https://arxiv.org/abs/2509.15216
Patent Language Model Pretraining with ModernBERT
Amirhossein Yousefiramandi, Ciaran Cooney
https://arxiv.org/abs/2509.14926 https://arxiv.org/pdf/2509.149…
Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding
Maciej Skorski, Alina Landowska
https://arxiv.org/abs/2508.13804 https://
Large Language Model probabilities cannot distinguish between possible and impossible language
Evelina Leivada, Raquel Montero, Paolo Morosi, Natalia Moskvina, Tamara Serrano, Marcel Aguilar, Fritz Guenther
https://arxiv.org/abs/2509.15114
B\"{u}y\"{u}k Dil Modelleri i\c{c}in TR-MMLU Benchmark{\i}: Performans De\u{g}erlendirmesi, Zorluklar ve \.{I}yile\c{s}tirme F{\i}rsatlar{\i}
M. Ali Bayram, Ali Arda Fincan, Ahmet Semih G\"um\"u\c{s}, Banu Diri, Sava\c{s} Y{\i}ld{\i}r{\i}m, \"Oner Ayta\c{s}
https://arxiv.org/abs/2508.13044
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
David Heineman, Valentin Hofmann, Ian Magnusson, Yuling Gu, Noah A. Smith, Hannaneh Hajishirzi, Kyle Lo, Jesse Dodge
https://arxiv.org/abs/2508.13144
ToolRM: Outcome Reward Models for Tool-Calling Large Language Models
Mayank Agarwal, Ibrahim Abdelaziz, Kinjal Basu, Merve Unuvar, Luis A. Lastras, Yara Rizk, Pavan Kapanipathi
https://arxiv.org/abs/2509.11963