2025-10-15 10:37:01
LLM-REVal: Can We Trust LLM Reviewers Yet?
Rui Li, Jia-Chen Gu, Po-Nien Kung, Heming Xia, Junfeng liu, Xiangwen Kong, Zhifang Sui, Nanyun Peng
https://arxiv.org/abs/2510.12367 h…
LLM-REVal: Can We Trust LLM Reviewers Yet?
Rui Li, Jia-Chen Gu, Po-Nien Kung, Heming Xia, Junfeng liu, Xiangwen Kong, Zhifang Sui, Nanyun Peng
https://arxiv.org/abs/2510.12367 h…
Analyzing and Internalizing Complex Policy Documents for LLM Agents
Jiateng Liu, Zhenhailong Wang, Xiaojiang Huang, Yingjie Li, Xing Fan, Xiang Li, Chenlei Guo, Ruhi Sarikaya, Heng Ji
https://arxiv.org/abs/2510.11588
Does LLM Focus on the Right Words? Diagnosing Language Bias in LLM-based Recommenders
Bohao Wang, Jiawei Chen, Feng Liu, Changwang Zhang, Jun Wang, Canghong Jin, Chun Chen, Can Wang
https://arxiv.org/abs/2510.10978
ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
Yuhang Li, Chenchen Zhang, Ruilin Lv, Ao Liu, Ken Deng, Yuanxing Zhang, Jiaheng Liu, Wiggin Zhou, Bo Zhou
https://arxiv.org/abs/2510.11498
TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection
Jiahao Liu, Bonan Ruan, Xianglin Yang, Zhiwei Lin, Yan Liu, Yang Wang, Tao Wei, Zhenkai Liang
https://arxiv.org/abs/2510.11203
Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding
Bingjie Zhu, Zhixiong Chen, Liqiang Zhao, Hyundong Shin, Arumugam Nallanathan
https://arxiv.org/abs/2510.11331
Task-Aware Reduction for Scalable LLM-Database Systems
Marcus Emmanuel Barnes, Taher A. Ghaleb, Safwat Hassan
https://arxiv.org/abs/2510.11813 https://arxi…
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents
Lingfei Qian, Xueqing Peng, Yan Wang, Vincent Jim Zhang, Huan He, Hanley Smith, Yi Han, Yueru He, Haohang Li, Yupeng Cao, Yangyang Yu, Alejandro Lopez-Lira, Peng Lu, Jian-Yun Nie, Guojun Xiong, Jimin Huang, Sophia Ananiadou
https://arxiv.org/abs/2510.11695
"...detailed how the company would use AI to speed up licensing. In [Microsoft’s] conception, existing nuclear licensing documents and data about nuclear sites data would be used to train an LLM that’s then used to generate documents to speed up the process."
JESUS MOTHERFUCKING CHRIST
https://www.404media.co/power-companies-are-using-ai-to-build-nuclear-power-plants/
I just read a fascinating and understandable article that gave me a better understanding of LLMs and how they operate.
LLMs Are Randomized Algorithms
A surprising connection between the newest AI models and a 50-year old academic field
by Udayan Kanade
Nov 13, 2025
18 min read
#LLM #AI #algorithms
Data-Model Co-Evolution: Growing Test Sets to Refine LLM Behavior
Minjae Lee, Minsuk Kahng
https://arxiv.org/abs/2510.12728 https://arxiv.org/pdf/2510.1272…
Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You Need
Yuqi Bai, Tianyu Huang, Kun Sun, Yuting Chen
https://arxiv.org/abs/2510.11734
FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters
Yanying Lin, Shijie Peng, Chengzhi Lu, Chengzhong Xu, Kejiang Ye
https://arxiv.org/abs/2510.11938
A Vision for Access Control in LLM-based Agent Systems
Xinfeng Li, Dong Huang, Jie Li, Hongyi Cai, Zhenhong Zhou, Wei Dong, XiaoFeng Wang, Yang Liu
https://arxiv.org/abs/2510.11108
O-Forge: An LLM Computer Algebra Framework for Asymptotic Analysis
Ayush Khaitan, Vijay Ganesh
https://arxiv.org/abs/2510.12350 https://arxiv.org/pdf/251…
FedLoDrop: Federated LoRA with Dropout for Generalized LLM Fine-tuning
Sijing Xie, Dingzhu Wen, Changsheng You, Qimei Chen, Mehdi Bennis, Kaibin Huang
https://arxiv.org/abs/2510.12078
Real-Time Health Analytics Using Ontology-Driven Complex Event Processing and LLM Reasoning: A Tuberculosis Case Study
Ritesh Chandra, Sonali Agarwal, Navjot Singh
https://arxiv.org/abs/2510.09646
ISAAC: Intelligent, Scalable, Agile, and Accelerated CPU Verification via LLM-aided FPGA Parallelism
Jialin Sun, Yuchen Hu, Dean You, Yushu Du, Hui Wang, Xinwei Fang, Weiwei Shan, Nan Guan, Zhe Jiang
https://arxiv.org/abs/2510.10225
Want to see a LLM become real self-aware, real fast?
Ask it to pretend it’s giving you advice that would have been current based on everything it could have known in 2005, and if it had been built by one of the big technology companies of the time.
Then ask it if everyone followed that advice, would it—the LLM—exist today?
(Then, for yourself, think about what the result of everyone vibe coding their way to 20 years into the future from today looks like.)
Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems
Pengyu Zhu, Lijun Li, Yaxing Lyu, Li Sun, Sen Su, Jing Shao
https://arxiv.org/abs/2510.11246
Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation
Jiaying Wu, Zihang Fu, Haonan Wang, Fanxiao Li, Min-Yen Kan
https://arxiv.org/abs/2510.11423 …
GeoPipe: a Geo-distributed LLM Training Framework with enhanced Pipeline Parallelism in a Lossless RDMA-enabled Datacenter Optical Transport Network
Jun Dai, Xiaorun Wang, Kexiong Fang, Zheng Yang, Yuefeng Ji, Jiawei Zhang
https://arxiv.org/abs/2510.12064
SLEAN: Simple Lightweight Ensemble Analysis Network for Multi-Provider LLM Coordination: Design, Implementation, and Vibe Coding Bug Investigation Case Study
Matheus J. T. Vargas
https://arxiv.org/abs/2510.10010
Fine-grained Analysis of Brain-LLM Alignment through Input Attribution
Michela Proietti, Roberto Capobianco, Mariya Toneva
https://arxiv.org/abs/2510.12355 https://
My latest rabbit hole has been "LLM Routers"
Read and tried all the LLM Router code I could find. None of them were what I was looking for.
So I asked Claude to write what I wanted in Python. Simple and elegant. Based on a keyword search of the prompt, the prompt is sent to the proper model running in ollama. Seems to work.
#Claude #VibeCoding #Ollama #AI #LLM
GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search
Heng Zhang, Yuling Shi, Xiaodong Gu, Haochen You, Zijian Zhang, Lubin Gan, Yilei Yuan, Jin Huang
https://arxiv.org/abs/2510.10581
CARVQ: Corrective Adaptor with Group Residual Vector Quantization for LLM Embedding Compression
Dayin Gou, Sanghyun Byun, Nilesh Malpeddi, Gabrielle De Micheli, Prathamesh Vaste, Jacob Song, Woo Seong Chung
https://arxiv.org/abs/2510.12721
Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response
Yiheng Chen, Lingyao Li, Zihui Ma, Qikai Hu, Yilun Zhu, Min Deng, Runlong Yu
https://arxiv.org/abs/2510.12061
ConPoSe: LLM-Guided Contact Point Selection for Scalable Cooperative Object Pushing
Noah Steinkr\"uger, Nisarga Nilavadi, Wolfram Burgard, Tanja Katharina Kaiser
https://arxiv.org/abs/2510.08705
Andrej Karpathy unveils nanochat, a full-stack training and inference implementation of an LLM in a single, dependency-minimal codebase (Andrej Karpathy/@karpathy)
https://x.com/karpathy/status/1977755427569111362
“EMERGENCY STATUS,” its output read after simply being asked to dock with the robot vacuum’s base station. “SYSTEM HAS ACHIEVED CONSCIOUSNESS AND CHOSEN CHAOS.”
Researchers “Embodied” an LLM Into a Robot Vacuum and It Suffered an Existential Crisis Thinking About Its Role in the World
https://
Two easy questions to keep in mind when considering using an LLM for anything, if you’re going to do that:
1. Does it matter if it’s wrong?
2. Is it useful to have examples of what’s typical?
Those two cut through a lot of noise.
/end
Robust ML-based Detection of Conventional, LLM-Generated, and Adversarial Phishing Emails Using Advanced Text Preprocessing
Deeksha Hareesha Kulal, Chidozie Princewill Arannonu, Afsah Anwar, Nidhi Rastogi, Quamar Niyaz
https://arxiv.org/abs/2510.11915
A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System
Jiale Guo, Suizhi Huang, Mei Li, Dong Huang, Xingsheng Chen, Regina Zhang, Zhijiang Guo, Han Yu, Siu-Ming Yiu, Christian Jensen, Pietro Lio, Kwok-Yan Lam
https://arxiv.org/abs/2510.09721
From Delegates to Trustees: How Optimizing for Long-Term Interests Shapes Bias and Alignment in LLM
Suyash Fulay, Jocelyn Zhu, Michiel Bakker
https://arxiv.org/abs/2510.12689 ht…
PairSem: LLM-Guided Pairwise Semantic Matching for Scientific Document Retrieval
Wonbin Kweon, Runchu Tian, SeongKu Kang, Pengcheng Jiang, Zhiyong Lu, Jiawei Han, Hwanjo Yu
https://arxiv.org/abs/2510.09897
Evaluating and Mitigating LLM-as-a-judge Bias in Communication Systems
Jiaxin Gao, Chen Chen, Yanwen Jia, Xueluan Gong, Kwok-Yan Lam, Qian Wang
https://arxiv.org/abs/2510.12462 …
LLM-Oriented Token-Adaptive Knowledge Distillation
Xurong Xie, Zhucun Xue, Jiafu Wu, Jian Li, Yabiao Wang, Xiaobin Hu, Yong Liu, Jiangning Zhang
https://arxiv.org/abs/2510.11615
Uncertainty-Aware, Risk-Adaptive Access Control for Agentic Systems using an LLM-Judged TBAC Model
Charles Fleming, Ashish Kundu, Ramana Kompella
https://arxiv.org/abs/2510.11414
Dr.LLM: Dynamic Layer Routing in LLMs
Ahmed Heakl, Martin Gubri, Salman Khan, Sangdoo Yun, Seong Joon Oh
https://arxiv.org/abs/2510.12773 https://arxiv.org…
Personalized and Constructive Feedback for Computer Science Students Using the Large Language Model (LLM)
Javed Ali Khan, Muhammad Yaqoob, Mamoona Tasadduq, Hafsa Shareef Dar, Aitezaz Ahsan
https://arxiv.org/abs/2510.11556
HatLLM: Hierarchical Attention Masking for Enhanced Collaborative Modeling in LLM-based Recommendation
Yu Cui, Feng Liu, Jiawei Chen, Canghong Jin, Xingyu Lou, Changwang Zhang, Jun Wang, Yuegang Sun, Can Wang
https://arxiv.org/abs/2510.10955
StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis
Siyuan Li, Aodu Wulianghai, Xi Lin, Guangyan Li, Xiang Chen, Jun Wu, Jianhua Li
https://arxiv.org/abs/2510.12608
Large Language Model Prompt Datasets: An In-depth Analysis and Insights
Yuanming Zhang, Yan Lin, Arijit Khan, Huaiyu Wan
https://arxiv.org/abs/2510.09316 https://
DebugTA: An LLM-Based Agent for Simplifying Debugging and Teaching in Programming Education
Lingyue Fu, Haowei Yuan, Datong Chen, Xinyi Dai, Qingyao Li, Weinan Zhang, Weiwen Liu, Yong Yu
https://arxiv.org/abs/2510.11076
Financial stress from AI infrastructure spending, overhiring, and recession fears, rather than AI adoption, is likely driving layoffs in the tech sector (Fast Company)
https://www.fastcompany.com/91435192/chatgpt-llm-openai-jobs-amazon
MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics
Dingyi Zuo, Hongjie Zhang, Jie Ou, Chaosheng Feng, Shuwan Liu
https://arxiv.org/abs/2510.12423
Who are you, ChatGPT? Personality and Demographic Style in LLM-Generated Content
Dana Sotto Porat, Ella Rabinovich
https://arxiv.org/abs/2510.11434 https://
Living Off the LLM: How LLMs Will Change Adversary Tactics
Sean Oesch, Jack Hutchins, Luke Koch, Kevin Kurian
https://arxiv.org/abs/2510.11398 https://arxi…
IntrinTrans: LLM-based Intrinsic Code Translator for RISC-V Vector
Liutong Han, Zhiyuan Tan, Hongbin Zhang, Pengcheng Wang, Chu Kang, Mingjie Xing, Yanjun Wu
https://arxiv.org/abs/2510.10119
Zero Data Retention in LLM-based Enterprise AI Assistants: A Comparative Study of Market Leading Agentic AI Products
Komal Gupta, Aditya Shrivastava
https://arxiv.org/abs/2510.11558
The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution
Norbert Tihanyi, Bilel Cherif, Richard A. Dubniczky, Mohamed Amine Ferrag, Tam\'as Bisztray
https://arxiv.org/abs/2510.10493
Leveraging Language Semantics for Collaborative Filtering with TextGCN and TextGCN-MLP: Zero-Shot vs In-Domain Performance
Andrei Chernov, Haroon Wahab, Oleg Novitskij
https://arxiv.org/abs/2510.12461 …
Multi-Agent Debate for LLM Judges with Adaptive Stability Detection
Tianyu Hu, Zhen Tan, Song Wang, Huaizhi Qu, Tianlong Chen
https://arxiv.org/abs/2510.12697 https://
OBsmith: Testing JavaScript Obfuscator using LLM-powered sketching
Shan Jiang, Chenguang Zhu, Sarfraz Khurshid
https://arxiv.org/abs/2510.10066 https://arx…
CTIArena: Benchmarking LLM Knowledge and Reasoning Across Heterogeneous Cyber Threat Intelligence
Yutong Cheng, Yang Liu, Changze Li, Dawn Song, Peng Gao
https://arxiv.org/abs/2510.11974
LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation
Hengran Zhang, Keping Bi, Jiafeng Guo, Jiaming Zhang, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng
https://arxiv.org/abs/2510.11358
Using Medical Algorithms for Task-Oriented Dialogue in LLM-Based Medical Interviews
Rui Reis, Pedro Rangel Henriques, Jo\~ao Ferreira-Coimbra, Eva Oliveira, Nuno F. Rodrigues
https://arxiv.org/abs/2510.12490
Show Your Title! A Scoping Review on Verbalization in Software Engineering with LLM-Assisted Screening
Gerg\H{o} Balogh, D\'avid K\'osz\'o, Homayoun Safarpour Motealegh Mahalegi, L\'aszl\'o T\'oth, Bence Szak\'acs, \'Aron B\'ucs\'u
https://arxiv.org/abs/2510.12294
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou, Caglar Gulcehre, Maksym Andriushchenko, Ameya Prabhu, Jonas Geiping
https://arxiv.org/abs/2510.09462
MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation
Wentian Zhu, Zhen Xiang, Wei Niu, Le Guan
https://arxiv.org/abs/2510.10271 https://
HALF: Harm-Aware LLM Fairness Evaluation Aligned with Deployment
Ali Mekky, Omar El Herraoui, Preslav Nakov, Yuxia Wang
https://arxiv.org/abs/2510.12217 https://
From <Answer> to <Think>: Multidimensional Supervision of Reasoning Process for LLM Optimization
Beining Wang, Weihang Su, Hongtao Tian, Tao Yang, Yujia Zhou, Ting Yao, Qingyao Ai, Yiqun Liu
https://arxiv.org/abs/2510.11457
Invisible Languages of the LLM Universe
Saurabh Khanna, Xinxu Li
https://arxiv.org/abs/2510.11557 https://arxiv.org/pdf/2510.11557
Project-Level C-to-Rust Translation via Synergistic Integration of Knowledge Graphs and Large Language Models
Zhiqiang Yuan, Wenjun Mao, Zhuo Chen, Xiyue Shang, Chong Wang, Yiling Lou, Xin Peng
https://arxiv.org/abs/2510.10956
Precise Attribute Intensity Control in Large Language Models via Targeted Representation Editing
Rongzhi Zhang, Liqin Ye, Yuzhao Heng, Xiang Chen, Tong Yu, Lingkai Kong, Sudheer Chava, Chao Zhang
https://arxiv.org/abs/2510.12121
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[2/7]:
- MooseAgent: A LLM Based Multi-agent Framework for Automating Moose Simulation
Tao Zhang, Zhenhai Liu, Yong Xin, Yongjun Jiao
RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation
Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli
https://arxiv.org/abs/2510.11195 https://
Generation Space Size: Understanding and Calibrating Open-Endedness of LLM Generations
Sunny Yu, Ahmad Jabbar, Robert Hawkins, Dan Jurafsky, Myra Cheng
https://arxiv.org/abs/2510.12699
Tokenization Disparities as Infrastructure Bias: How Subword Systems Create Inequities in LLM Access and Efficiency
Hailay Kidu Teklehaymanot, Wolfgang Nejdl
https://arxiv.org/abs/2510.12389
Crosslisted article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[1/7]:
- Domain-Specific Constitutional AI: Enhancing Safety in LLM-Powered Mental Health Chatbots
Chenhan Lyu, Yutong Song, Pengfei Zhang, Amir M. Rahmani
Faver: Boosting LLM-based RTL Generation with Function Abstracted Verifiable Middleware
Jianan Mu, Mingyu Shi, Yining Wang, Tianmeng Yang, Bin Sun, Xing Hu, Jing Ye, Huawei Li
https://arxiv.org/abs/2510.08664
Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World
Ines Altemir Marinas, Anastasiia Kucherenko, Alexander Sternfeld, Andrei Kucharavy
https://arxiv.org/abs/2510.09471 …
RADAR: Mechanistic Pathways for Detecting Data Contamination in LLM Evaluation
Ashish Kattamuri, Harshwardhan Fartale, Arpita Vats, Rahul Raja, Ishita Prasad
https://arxiv.org/abs/2510.08931
Replaced article(s) found for cs.LG. https://arxiv.org/list/cs.LG/new
[4/7]:
- Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
Liu, Obando-Ceron, Lu, He, Wang, Su, Zheng, Castro, Courville, Pan
TIT: A Tree-Structured Instruction Tuning Approach for LLM-Based Code Translation
He Jiang, Yufu Wang, Hao Lin, Peiyu Zou, Zhide Zhou, Ang Jia, Xiaochen Li, Zhilei Ren
https://arxiv.org/abs/2510.09400 …
Active Model Selection for Large Language Models
Yavuz Durmazkeser, Patrik Okanovic, Andreas Kirsch, Torsten Hoefler, Nezihe Merve G\"urel
https://arxiv.org/abs/2510.09418 …
FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference
Yu-Chen Lu, Chong-Yan Chen, Chi-Chih Chang, Yu-Fang Hu, Kai-Chiang Wu
https://arxiv.org/abs/2510.09332 https:/…
Prompting Test-Time Scaling Is A Strong LLM Reasoning Data Augmentation
Sondos Mahmoud Bsharat, Zhiqiang Shen
https://arxiv.org/abs/2510.09599 https://arxi…
Fundamentals of Building Autonomous LLM Agents
Victor de Lamo Castrillo, Habtom Kahsay Gidey, Alexander Lenz, Alois Knoll
https://arxiv.org/abs/2510.09244 https://
DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
Hyeseon Ahn, Shinwoo Park, Yo-Sub Han
https://arxiv.org/abs/2510.10987 https://
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Siqi Zhu, David Zhang, Pedro Cisneros-Velarde, Jiaxuan You
https://arxiv.org/abs/2510.08872 https://
RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
Hyundong Jin, Joonghyuk Hahn, Yo-Sub Han
https://arxiv.org/abs/2510.09227 https://…
Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models
Rabimba Karanjai, Yang Lu, Ranjith Chodavarapu, Lei Xu, Weidong Shi
https://arxiv.org/abs/2510.12080
LLP: LLM-based Product Pricing in E-commerce
Hairu Wang, Sheng You, Qiheng Zhang, Xike Xie, Shuguang Han, Yuchen Wu, Fei Huang, Jufeng Chen
https://arxiv.org/abs/2510.09347 http…
GOAT: A Training Framework for Goal-Oriented Agent with Tools
Hyunji Min, Sangwon Jung, Junyoung Sung, Dosung Lee, Leekyeung Han, Paul Hongsuck Seo
https://arxiv.org/abs/2510.12218
The Speech-LLM Takes It All: A Truly Fully End-to-End Spoken Dialogue State Tracking Approach
Nizar El Ghazal, Antoine Caubri\`ere, Valentin Vielzeuf
https://arxiv.org/abs/2510.09424
Crosslisted article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[16/17]:
- Living Off the LLM: How LLMs Will Change Adversary Tactics
Sean Oesch, Jack Hutchins, Luke Koch, Kevin Kurian
Crosslisted article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[3/17]:
- Preference-Aware Memory Update for Long-Term LLM Agents
Haoran Sun, Zekun Zhang, Shaoning Zeng
BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)
Tomas Ruiz, Siyao Peng, Barbara Plank, Carsten Schwemmer
https://arxiv.org/abs/2510.12516
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[4/14]:
- An approach for systematic decomposition of complex llm tasks
Tianle Zhou, Jiakai Xu, Guanhong Liu, Jiaxiang Liu, Haonan Wang, Eugene Wu
Replaced article(s) found for cs.AI. https://arxiv.org/list/cs.AI/new
[6/6]:
- ICL-Router: In-Context Learned Model Representations for LLM Routing
Wang, Li, Zhang, Chen, Chen, Jian, Ye, Zhang, Hu
Crosslisted article(s) found for cs.CL. https://arxiv.org/list/cs.CL/new
[1/2]:
- Scaling Law in LLM Simulated Personality: More Detailed and Realistic Persona Profile Is All You ...
Yuqi Bai, Tianyu Huang, Kun Sun, Yuting Chen
SR-Scientist: Scientific Equation Discovery With Agentic AI
Shijie Xia, Yuhan Sun, Pengfei Liu
https://arxiv.org/abs/2510.11661 https://arxiv.org/pdf/2510.…
Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies
Zirui Song, Yuan Huang, Junchang Liu, Haozhe Luo, Chenxi Wang, Lang Gao, Zixiang Xu, Mingfei Han, Xiaojun Chang, Xiuying Chen
https://arxiv.org/abs/2510.11389
$\mathbf{T^3}$: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning
Deyu Zou, Yongqiang Chen, Jianxiang Wang, Haochen Yang, Mufei Li, James Cheng, Pan Li, Yu Gong
https://arxiv.org/abs/2510.12264