
2025-06-18 08:05:39
Very nice article about LLM architecture, a bit too complicated for me but probably not for others..
https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison
New study on the effects of LLM use (in this case on essay writing):
https://arxiv.org/abs/2506.08872
Quote:
"LLM users also struggled to accurately quote their own work. While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four month…
Wissenschaftler:innen haben herausgefunden: Wer ChatGPT oder andere Bullshit-Generatoren nutzt, verblödet innerhalb kurzer Zeit.
#LLM
LLM Found Transmitting Behavioral Traits to 'Student' LLM Via Hidden Signals in Data - Slashdot
https://slashdot.org/story/25/08/17/0331217/llm-found-transmitting-behavioral-traits-to-student-llm-via-hidden-signals-in-data
[Thread] An OpenAI researcher says the company's latest experimental reasoning LLM achieved gold medal-level performance on the 2025 International Math Olympiad (Alexander Wei/@alexwei_)
https://x.com/alexwei_/status/1946477742855532918
Detecting LLM-generated Code with Subtle Modification by Adversarial Training
Xin Yin, Xinrui Li, Chao Ni, Xiaodan Xu, Xiaohu Yang
https://arxiv.org/abs/2507.13123
Watermarking LLM-Generated Datasets in Downstream Tasks
Yugeng Liu, Tianshuo Cong, Michael Backes, Zheng Li, Yang Zhang
https://arxiv.org/abs/2506.13494 ht…
ProfiLLM: An LLM-Based Framework for Implicit Profiling of Chatbot Users
Shahaf David, Yair Meidan, Ido Hersko, Daniel Varnovitzky, Dudu Mimran, Yuval Elovici, Asaf Shabtai
https://arxiv.org/abs/2506.13980
SO close to 10x programmer! #agiAnytimeSoon https://infosec.exchange/@adamshostack/115050433640095929
📝🗃️ 𝗿𝗱𝗼𝗰𝗱𝘂𝗺𝗽: Dump ‘R’ Package Source, Documentation, and Vignettes into One File for use in LLMs #rstats #LLM is on CRAN https://www.ekotov.pro/rdocdum…
An LLM Agent-Based Complex Semantic Table Annotation Approach
Yilin Geng, Shujing Wang, Chuan Wang, Keqing He, Yanfei Lv, Ying Wang, Zaiwen Feng, Xiaoying Bai
https://arxiv.org/abs/2508.12868
Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
Aaron Councilman, David Fu, Aryan Gupta, Chengxiao Wang, David Grove, Yu-Xiong Wang, Vikram Adve
https://arxiv.org/abs/2507.13290
Software Engineer Will Larson unpacks a lot in this July 2025 post. Key takeaway use cases of agentic AI include:
1. Using an LLM to evaluate a context window and get a result.
2. Using an LLM to suggest tools relevant to the context window, then enrich it with the tool’s response.
3. Managing flow control for tool usage.
4. Doing anything software can do to build better context windows to pass on to LLMs.
"What can agents actually do?"
"Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task"
https://doi.org/10.48550/arXiv.2506.08872
"[…] While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four mont…
To add a single example here (feel free to chime in with your own):
Problem: editing code is sometimes tedious because external APIs require boilerplate.
Solutions:
- Use LLM-generated code. Downsides: energy use, code theft, potential for legal liability, makes mistakes, etc. Upsides: popular among some peers, seems easy to use.
- Pick a better library (not always possible).
- Build internal functions to centralize boilerplate code, then use those (benefits: you get a better understanding of the external API, and a more-unit-testable internal code surface; probably less amortized effort).
- Develop a non-LLM system that actually reasons about code at something like the formal semantics level and suggests boilerplate fill-ins based on rules, while foregrounding which rules it's applying so you can see the logic behind the suggestions (needs research).
Obviously LLM use in coding goes beyond this single issue, but there are similar analyses for each potential use of LLMs in coding. I'm all cases there are:
1. Existing practical solutions that require more effort (or in many cases just seem to but are less-effort when amortized).
2. Near-term researchable solutions that directly address the problem and which would be much more desirable in the long term.
Thus in addition to disastrous LLM effects on the climate, on data laborers, and on the digital commons, they tend to suck us into cheap-seeming but ultimately costly design practices while also crowding out better long-term solutions. Next time someone suggests how useful LLMs are for some task, try asking yourself (or them) what an ideal solution for that task would look like, and whether LLM use moves us closer to or father from a world in which that solution exists.
Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis
Mithat Can Ozgun, Jiahuan Pei, Koen Hindriks, Lucia Donatelli, Qingzhi Liu, Xin Sun, Junxiao Wang
https://arxiv.org/abs/2508.11398
"LLM group's participants performed worse than their counterparts in the Brain-only group at all levels: neural, linguistic, scoring."
Brain scans confirmed significantly fewer neural connections for LLM users
Stop using LLMs if you value your brain
https://arxiv.org/pdf/2506.08872…
Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings
Harbin Hong, Sebastian Caldas, Liu Leqi
https://arxiv.org/abs/2506.14997
LLM Agent for Hyper-Parameter Optimization
Wanzhe Wang, Jianqiu Peng, Menghao Hu, Weihuang Zhong, Tong Zhang, Shuai Wang, Yixin Zhang, Mingjie Shao, Wanli Ni
https://arxiv.org/abs/2506.15167
ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
Jia Liu, ChangYi He, YingQiao Lin, MingMin Yang, FeiYang Shen, ShaoGuo Liu, TingTing Gao
https://arxiv.org/abs/2508.11356
I just saw an all-caps instruction file that someone uses to 'instruct' an LLM to help with coding, and it's just "don't hallucinate", "check your work", "don't say you did something when you didn't" with multiple exclamation marks.
So, basically the whole 'vibe coding,' or having "AI" "help" with coding just devolves into shouting at your computer.
Which reminded me of something, and then it hit me!
#ai #llm #vibecoding
https://www.youtube.com/watch?v=q8SWMAQYQf0
Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems
Zhongzhi Yu, Mingjie Liu, Michael Zimmer, Yingyan Celine Lin, Yong Liu, Haoxing Ren
https://arxiv.org/abs/2506.13905
osmAG-LLM: Zero-Shot Open-Vocabulary Object Navigation via Semantic Maps and Large Language Models Reasoning
Fujing Xie, S\"oren Schwertfeger, Hermann Blum
https://arxiv.org/abs/2507.12753
TRACY: Benchmarking Execution Efficiency of LLM-Based Code Translation
Zhihao Gong, Zeyu Sun, Dong Huang, Qingyuan Liang, Jie M. Zhang, Dan Hao
https://arxiv.org/abs/2508.11468 …
LLM-Powered Swarms: A New Frontier or a Conceptual Stretch?
Muhammad Atta Ur Rahman, Melanie Schranz
https://arxiv.org/abs/2506.14496 https://
LLM-Driven Data Generation and a Novel Soft Metric for Evaluating Text-to-SQL in Aviation MRO
Patrick Sutanto, Jonathan Kenrick, Max Lorenz, Joan Santoso
https://arxiv.org/abs/2506.13785
Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection
Chi Wang, Min Gao, Zongwei Wang, Junwei Yin, Kai Shu, Chenghua Lin
https://arxiv.org/abs/2508.12632 …
Explain First, Trust Later: LLM-Augmented Explanations for Graph-Based Crypto Anomaly Detection
Adriana Watson
https://arxiv.org/abs/2506.14933 https://
[Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel (Rohan Paul/@rohanpaul_ai)
https://x.com/rohanpaul_ai/status/1934751145400111572
@… @… That’s one of the reasons I don’t use an LLM!
I did try it once, for coding. It lied to me. So I didn’t use it again.
People keep saying “it’s like an intern”. If an intern repeatedly lies to your face, they are bad at the on…
LLMs the Model Context Protocol (MCP) are the Yang to the Semantic Web Project's Yin.
We now have a solution to the final hurdle—visualization.
Years of Linked Data work now come alive. I explain this, with demonstrations, in a new newsletter post.
www.linkedin.com/pulse/semant...
#MCP
LLM vs. SAST: A Technical Analysis on Detecting Coding Bugs of GPT4-Advanced Data Analysis
Madjid G. Tehrani, Eldar Sultanow, William J. Buchanan, Mahkame Houmani, Christel H. Djaha Fodja
https://arxiv.org/abs/2506.15212
This Github repository conveniently lists and categorizes prime examples of LLM-based agent applications. Each example application features its own repository folder with its source code (Python), and a helpful README.md file describing its installation and use.
Categories include:
1. Starter AI Agents
2. Advanced AI Agents
3. Autonomous Game Playing Agents
4. Multi-Agent Teams
5. Voice AI Agents
6. RAG-Based Agents
"awesome-llm-apps"
Inference performance evaluation for LLMs on edge devices with a novel benchmarking framework and metric
Hao Chen, Cong Tian, Zixuan He, Bin Yu, Yepang Liu, Jialun Cao
https://arxiv.org/abs/2508.11269 …
Impact of a Deployed LLM Survey Creation Tool through the IS Success Model
Peng Jiang, Vinicius Cezar Monteiro de Lira, Antonio Maiorino
https://arxiv.org/abs/2506.14809
Tibber hat am Kundenservice-LLM rumgeschraubt und obwohl die Antwort beim letzten Mal total nutzlos und unbrauchbar war, hat sie diesmal sehr weitergeholfen. Es ist nur etwas irritierend wenn ich eine Mail schreibe, sofort eine Antwort bekomme und denke "ah, da ist die Eingangsbestätigung mit Ticketnummer" - nein, das ist die vollkommen korrekte Antwort und der Vorgang damit abgeschlossen. Irre.
LLM-Based Config Synthesis requires Disambiguation
Rajdeep Mondal, Nikolaj Bjorner, Todd Millstein, Alan Tang, George Varghese
https://arxiv.org/abs/2507.12443
Things almost impossible to do without good LLM software (in one minute).
I hear a music on a radio. Google music search gives me "Robbie Williams - forbidden road". But I know the words are somewhat different and I want to know what movie I have in mind.
Gemini says it's in fact, similar song to "I got a name", then my brain clicks and connects it with Quentin Tarantino.
Bingo - it's Django.
Just published 🚀: When LLMs Remember Instead of Reason
#llm
@… This comment to the video seems on spot:
It seems McKinsey aren't aware that "agentic AI" is just an LLM that can utter some magic incantations that do stuff. It's like a difference between a chimpanzee with a typewriter vs a chimpanzee with a typewriter and a gun.
Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-AI Interactions
Junfeng Jiao, Saleh Afroogh, Kevin Chen, Abhejay Murali, David Atkinson, Amit Dhurandhar
https://arxiv.org/abs/2506.13510
ADRD: LLM-Driven Autonomous Driving Based on Rule-based Decision Systems
Fanzhi Zeng, Siqi Wang, Chuzhao Zhu, Li Li
https://arxiv.org/abs/2506.14299 https:…
PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning
Yuhui Shi, Yehan Yang, Qiang Sheng, Hao Mi, Beizhe Hu, Chaoxi Xu, Juan Cao
https://arxiv.org/abs/2506.15683
Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset
Zhipeng Xue, Xiaoting Zhang, Zhipeng Gao, Xing Hu, Shan Gao, Xin Xia, Shanping Li
https://arxiv.org/abs/2508.11958
RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments
Yuchuan Fu, Xiaohan Yuan, Dongxia Wang
https://arxiv.org/abs/2506.15253
StackPilot: Autonomous Function Agents for Scalable and Environment-Free Code Execution
Xinkui Zhao, Yifan Zhang, Zhengyi Zhou, Yueshen Xu
https://arxiv.org/abs/2508.11665 https…
FACET:Teacher-Centred LLM-Based Multi-Agent Systems-Towards Personalized Educational Worksheets
Jana Gonnermann-M\"uller, Jennifer Haase, Konstantin Fackeldey, Sebastian Pokutta
https://arxiv.org/abs/2508.11401
Watching the frustratingly fruitless fights over the USEFULNESS of LLM-based coding helpers, I've come down to 3 points that explain why ppl seem to live in different realities:
Most programmers:
1) Write inconsequential remixes of trivial code that has been written many times before.
2) Lack the taste for good design & suck at code review in general (yours truly included).
3) Lack the judgement to differentiate between 1) & FOSS repos of nontrivial code, …
Bias is a Math Problem, AI Bias is a Technical Problem: 10-year Literature Review of AI/LLM Bias Research Reveals Narrow [Gender-Centric] Conceptions of 'Bias', and Academia-Industry Gap
Sourojit Ghosh, Kyra Wilson
https://arxiv.org/abs/2508.11067
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
Rui Bao, Nan Xue, Yaping Sun, Zhiyong Chen
https://arxiv.org/abs/2508.11291 https://
Don't Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning
William F. Shen, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane
https://arxiv.org/abs/2506.14387
SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts
Marc Brinner, Sina Zarriess
https://arxiv.org/abs/2507.13105
Uncovering Intention through LLM-Driven Code Snippet Description Generation
Yusuf Sulistyo Nugroho, Farah Danisha Salam, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, Kenichi Matsumoto
https://arxiv.org/abs/2506.15453
deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses
Georgios Androutsopoulos, Antonio Bianchi
https://arxiv.org/abs/2506.15648
AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager
Xuhua Zhao, Yuxuan Xie, Caihua Chen, Yuxiang Sun
https://arxiv.org/abs/2508.11416 https://
BIPOLAR: Polarization-based granular framework for LLM bias evaluation
Martin Pavl\'i\v{c}ek, Tom\'a\v{s} Filip, Petr Sos\'ik
https://arxiv.org/abs/2508.11061 https:…
CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems
Xuran Liu, Nan Xue, Rui Bao, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Shuguang Cui
https://arxiv.org/abs/2508.11287
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
Yanxu Mao, Tiehan Cui, Peipei Liu, Datao You, Hongsong Zhu
https://arxiv.org/abs/2506.15170
Unified Software Engineering agent as AI Software Engineer
Leonhard Applis, Yuntong Zhang, Shanchao Liang, Nan Jiang, Lin Tan, Abhik Roychoudhury
https://arxiv.org/abs/2506.14683 …
NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting
Kuangshi Ai, Kaiyuan Tang, Chaoli Wang
https://arxiv.org/abs/2507.12621
Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information
Xiao Zhan, Juan Carlos Carrillo, William Seymour, Jose Such
https://arxiv.org/abs/2506.11680
An LLM ASP Workflow for Joint Entity-Relation Extraction
Trang Tran, Trung Hoang Le, Huiping Cao, Tran Cao Son
https://arxiv.org/abs/2508.12611 https://a…
LLM Compression: How Far Can We Go in Balancing Size and Performance?
Sahil Sk, Debasish Dhal, Sonal Khosla, Sk Shahid, Sambit Shekhar, Akash Dhaka, Shantipriya Parida, Dilip K. Prasad, Ond\v{r}ej Bojar
https://arxiv.org/abs/2508.11318
The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries
Weipeng Jiang, Xiaoyu Zhang, Xiaofei Xie, Jiongchi Yu, Yuhan Zhi, Shiqing Ma, Chao Shen
https://arxiv.org/abs/2506.12320
PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection
Wenhao Li, Selvakumar Manickam, Yung-wey Chong, Shankar Karuppayah
https://arxiv.org/abs/2506.15656 …
AviationLLM: An LLM-based Knowledge System for Aviation Training
Jia'ang Wan, Feng Shen, Fujuan Li, Yanjin Sun, Yan Li, Shiwen Zhang
https://arxiv.org/abs/2506.14336
Reference Points in LLM Sentiment Analysis: The Role of Structured Context
Junichiro Niimi
https://arxiv.org/abs/2508.11454 https://arxiv.org/pdf/2508.1145…
SimInterview: Transforming Business Education through Large Language Model-Based Simulated Multilingual Interview Training System
Truong Thanh Hung Nguyen, Tran Diem Quynh Nguyen, Hoang Loc Cao, Thi Cam Thanh Tran, Thi Cam Mai Truong, Hung Cao
https://arxiv.org/abs/2508.11873
LLM-Powered Quantum Code Transpilation
Nazanin Siavash, Armin Moin
https://arxiv.org/abs/2507.12480 https://arxiv.org/pdf/2507.12480
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns
Xin Chen, Junchao Wu, Shu Yang, Runzhe Zhan, Zeyu Wu, Ziyang Luo, Di Wang, Min Yang, Lidia S. Chao, Derek F. Wong
https://arxiv.org/abs/2508.13152
Doppelg\"anger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack
Daewon Kang, YeongHwan Shin, Doyeon Kim, Kyu-Hwan Jung, Meong Hi Son
https://arxiv.org/abs/2506.14539
Watermarking LLM-Generated Datasets in Downstream Tasks
Yugeng Liu, Tianshuo Cong, Michael Backes, Zheng Li, Yang Zhang
https://arxiv.org/abs/2506.13494 ht…
Multimodal "Puppeteer": An Exploration of Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality
Yuchong Zhang, Bastian Orthmann, Shichen Ji, Michael Welle, Jonne Van Haastregt, Danica Kragic
https://arxiv.org/abs/2506.13189…
RUM: Rule LLM-Based Comprehensive Assessment on Testing Skills
Yue Wang, Zhenyu Chen, Yuan Zhao, Chunrong Fang, Ziyuan Wang, Song Huang
https://arxiv.org/abs/2508.12922 https://…
SafeConstellations: Steering LLM Safety to Reduce Over-Refusals Through Task-Specific Trajectory
Utsav Maskey, Sumit Yadav, Mark Dras, Usman Naseem
https://arxiv.org/abs/2508.11290
ImpReSS: Implicit Recommender System for Support Conversations
Omri Haller, Yair Meidan, Dudu Mimran, Yuval Elovici, Asaf Shabtai
https://arxiv.org/abs/2506.14231
Hallucination in LLM-Based Code Generation: An Automotive Case Study
Marc Pavel, Nenad Petrovic, Lukasz Mazur, Vahid Zolfaghari, Fengjunjie Pan, Alois Knoll
https://arxiv.org/abs/2508.11257
SpecDetect: Simple, Fast, and Training-Free Detection of LLM-Generated Text via Spectral Analysis
Haitong Luo, Weiyao Zhang, Suhang Wang, Wenji Zou, Chungang Lin, Xuying Meng, Yujun Zhang
https://arxiv.org/abs/2508.11343
An LLM's Apology: Outsourcing Awkwardness in the Age of AI
Twm Stone, Anna Soligo
https://arxiv.org/abs/2506.13685 https://arxiv.…
LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery
Arshia Akhavan, Alireza Hosseinpour, Abbas Heydarnoori, Mehdi Keshani
https://arxiv.org/abs/2508.12232
Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes
Tyler Loakman, William Thorne, Chenghua Lin
https://arxiv.org/abs/2507.13335
Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models
Haonan Yin, Shai Vardi, Vidyanand Choudhary
https://arxiv.org/abs/2506.14092 h…
How Does LLM Reasoning Work for Code? A Survey and a Call to Action
Ira Ceka, Saurabh Pujar, Irene Manotas, Gail Kaiser, Baishakhi Ray, Shyam Ramji
https://arxiv.org/abs/2506.13932
DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning
Weize Liu, Yongchi Zhao, Yijia Luo, Mingyu Xu, Jiaheng Liu, Yanan Li, Xiguo Hu, Yuchi Xu, Wenbo Su, Bo Zheng
https://arxiv.org/abs/2508.12726
ORFuzz: Fuzzing the "Other Side" of LLM Safety -- Testing Over-Refusal
Haonan Zhang, Dongxia Wang, Yi Liu, Kexin Chen, Jiashui Wang, Xinlei Ying, Long Liu, Wenhai Wang
https://arxiv.org/abs/2508.11222
Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries
Kawin Mayilvaghanan, Siddhant Gupta, Ayush Kumar
https://arxiv.org/abs/2508.13124
Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Yuto Harada, Yusuke Yamauchi, Yusuke Oda, Yohei Oseki, Yusuke Miyao, Yu Takagi
https://arxiv.org/abs/2506.14681
Analyzing Information Sharing and Coordination in Multi-Agent Planning
Tianyue Ou, Saujas Vaduguru, Daniel Fried
https://arxiv.org/abs/2508.12981 https://a…
FIRESPARQL: A LLM-based Framework for SPARQL Query Generation over Scholarly Knowledge Graphs
Xueli Pan, Victor de Boer, Jacco van Ossenbruggen
https://arxiv.org/abs/2508.10467 …
The Behavior Gap: Evaluating Zero-shot LLM Agents in Complex Task-Oriented Dialogs
Avinash Baidya, Kamalika Das, Xiang Gao
https://arxiv.org/abs/2506.12266
Personalized LLM Decoding via Contrasting Personal Preference
Hyungjune Bu, Chanjoo Jung, Minjae Kang, Jaehyung Kim
https://arxiv.org/abs/2506.12109 https:…
Beyond Single Models: Enhancing LLM Detection of Ambiguity in Requests through Debate
Ana Davila, Jacinto Colan, Yasuhisa Hasegawa
https://arxiv.org/abs/2507.12370
Simplifications are Absolutists: How Simplified Language Reduces Word Sense Awareness in LLM-Generated Definitions
Lukas Ellinger, Miriam Ansch\"utz, Georg Groh
https://arxiv.org/abs/2507.11981