»Forscher verstecken LLM-Prompts in Papern, um bessere Bewertungen zu erhalten:
Forscher verstecken in ihren Papern Prompts, die bessere Bewertungen bringen sollen und faule Reviewer entlarven können.«
Aha, so klug ist also die KI und forscht über sich selber in der KI? Nun ja das war mMn vorhersehbar, denn Betrug gab es in der Wissenschaft schon immer wieder. Nun dies löst zumindest eine Diskussion aus.
🤖
VFLAIR-LLM: A Comprehensive Framework and Benchmark for Split Learning of LLMs
Zixuan Gu, Qiufeng Fan, Long Sun, Yang Liu, Xiaojun Ye
https://arxiv.org/abs/2508.03097 https://…
Pay What LLM Wants: Can LLM Simulate Economics Experiment with 522 Real-human Persona?
Junhyuk Choi, Hyeonchu Park, Haemin Lee, Hyebeen Shin, Hyun Joung Jin, Bugeun Kim
https://arxiv.org/abs/2508.03262
ReFuzzer: Feedback-Driven Approach to Enhance Validity of LLM-Generated Test Programs
Iti Shree, Karine Even-Mendoz, Tomasz Radzik
https://arxiv.org/abs/2508.03603 https://
#LLM folks when someone points out that it's unethical: "it's just a tool, it depends on how you use it!"
LLM folks when "#AI" messes up and they're asked to take responsibility: 👀 [monkey side eyes meme]
#LLM -owcy, kiedy ktoś zwraca im uwagę na nieetyczność tego rozwiązania: "to tylko narzędzie, ty wybierasz, jak go używasz!"
LLM-owcy, kiedy "#AI" odpieprzy manianę i mają za to wziąć odpowiedzialność: 👀 [mem z pluszową małpą]
Privacy Risks of LLM-Empowered Recommender Systems: An Inversion Attack Perspective
Yubo Wang, Min Tang, Nuo Shen, Shujie Cui, Weiqing Wang
https://arxiv.org/abs/2508.03703 http…
Toward Verifiable Misinformation Detection: A Multi-Tool LLM Agent Framework
Zikun Cui, Tianyi Huang, Chia-En Chiang, Cuiqianhe Du
https://arxiv.org/abs/2508.03092 https://
"#Slopsquatting is a type of #cybersquatting. It is the practice of registering a non-existent software package name that a large language model (#LLM) may hallucinate in its output, whereby someone u…
R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation
Futian Wang, Yuhan Qiao, Xiao Wang, Fuling Wang, Yuxiang Zhang, Dengdi Sun
https://arxiv.org/abs/2508.03426
Risikomanagement und Resilienz in der IT-Sicherheit: IT-Sicherheitstag Dortmund
Praxisnahe Vorträge am 16. September – von aktuellen Hacking-Methoden über Schutz vor LLM-Angriffen bis hin zu Strategien für mehr Cyber-Resilienz.
SAGE-HLS: Syntax-Aware AST-Guided LLM for High-Level Synthesis Code Generation
M Zafir Sadik Khan, Nowfel Mashnoor, Mohammad Akyash, Kimia Azar, Hadi Kamali
https://arxiv.org/abs/2508.03558
Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS
Bingyu Yan, Ziyi Zhou, Xiaoming Zhang, Chaozhuo Li, Ruilin Zeng, Yirui Qi, Tianbo Wang, Litian Zhang
https://arxiv.org/abs/2508.03125
Even if “AI” worked (it doesn’t), there’s many reasons why you shouldn’t use it:
1. It’s destroying Internet sites that you love as you use chat bots instead of actually going to sources of information—this will cause them to be less active and eventually shut down.
2. Pollution and water use from server farms cause immediate harm; often—just like other heavy industry—these are built in underprivileged communities and harming poor people. Without any benefits as the big tech companies get tax breaks and don’t pay for power, while workers aren’t from the community but commute in.
3. The basic underlying models of any LLM rely on stolen data, even when specific extra data is obtained legally. Chatbots can’t learn to speak English just by reading open source code.
4. You’re fueling a speculation bubble that is costing many people their jobs—because the illusion of “efficiency” is kept up by firing people and counting that as profit.
5. Whenever you use the great cheat machine in the cloud you’re robbing yourself from doing real research, writing or coding—literally atrophying your brain and making you stupider.
It’s a grift, through and through.
I'm starting a new software consultancy called "LLM" or "Losers Like Machines,” specializing in fixing people's AI-generated trash. $2000/hour
Will you see an LLM president in your lifetime?
For context - I didn't write that prompt or feed particularly bad input to prove my point. I was trying out Claude, which so many people tell me is "the good" LLM product and above-and-beyond.
This prompt was written by Anthropic's own team, and featured in their gallery of web apps that can be built with this AI.
THEY THINK THIS IS GOOD!
StyliTruth : Unlocking Stylized yet Truthful LLM Generation via Disentangled Steering
Chenglei Shen, Zhongxiang Sun, Teng Shi, Xiao Zhang, Jun Xu
https://arxiv.org/abs/2508.04530
I am at the point where if I see ✨ in a post or email or message or whatever, that I assume it’s about some fake-AI tool or feature or shill or scam, and then ignore or delete it (sometimes blocking the perpetrator).
Anyway, fair warning that those corporate LLM anus logos may have ruined that emoji (the symbols in the emoji are generally little four-pointed stars).
Learning to Incentivize: LLM-Empowered Contract for AIGC Offloading in Teleoperation
Zijun Zhan, Yaxian Dong, Daniel Mawunyo Doe, Yuqing Hu, Shuai Li, Shaohua Cao, Zhu Han
https://arxiv.org/abs/2508.03464
Hmm, statt der üblichen 10 bis maximal 100 täglichen Zugriffen gestern fast 3000 Blogzugriffe (aber nur 10 Besucher). Klingt schwer nach LLM-Crawler - oder was könnte das sonst sein?
The recent release of Apertus, a fully open suite of large language models (LLMs), is super interesting.
The technical report provides plenty of details about the entire process.
#ai #opensource #llm
I've switched my local play LLM from Gemma3 to Qwen3-Coder-30B-A3B-Instruct-IQ4_NL - it's actually usefully fast on my CPU (AMD 3950x - llama -t 32) about 14 token/s- Gemma3 is only something like 2-3 token/s. The first fast one I've found that doesn't make too much gibberish. Not as good on translation though, hmm.
I guess the speed is due to MoE ?
@… Yes, so much yes.
I have started using “a checklist / comparison table where none was necessary” as an LLM heuristic now, because there are so many terrible articles (many written before mainstream LLMs were released) which use a checklist as a replacement for any actual content, and now the machines have been trained to regurgitate them.
My thinking …
My plane still hasn't taken off after a hour, because United Airlines had a *system wide* crash in the software that computes the cargo load and distribution needed for take off.
WHY THE EVER LOVING FUCK DOES SUCH A SYSTEM NEDD A NETWORK CONNECTION? YOU COULD HANDLE ALL THE NEEDED COMPUTATION ON A BLOODY PHONE IN THE CAPTAIN'S POCKET!
Are they using a gratuitous LLM or something? What the hell, United?
Students aren't generally looking to cheat, and most think LLM use should not be allowed in college education. Let's not allow salespeople to speak for them.
What I like about the @… built-in LLM answers:
- It's only(!) triggered on request, i.e. when your query ends with a question mark.
- It always admits when it couldn't find good information or there is no clear answer.
- It always cites sources, usually word for word.
I find myself not so much "believing the AI" but rather us…
Blueprint First, Model Second: A Framework for Deterministic LLM Workflow
Libin Qiu, Yuhang Ye, Zhirong Gao, Xide Zou, Junfu Chen, Ziming Gui, Weizhi Huang, Xiaobo Xue, Wenkai Qiu, Kun Zhao
https://arxiv.org/abs/2508.02721
Are LLM Agents the New RPA? A Comparative Study with RPA Across Enterprise Workflows
Petr Pr\r{u}cha, Michaela Matou\v{s}kov\'a, Jan Strnad
https://arxiv.org/abs/2509.04198 …
I already have a deeply-trained deterministic expert system using carbon-based storage and processing which gives me the answer provided by a LLM ('the commands to run to get an answer') in an instant.
It even knows not to tell me 'netstat' when I'm on a FreeBSD box. https://infosec.exchange/…
@… I imagine they just type “Is that true?” Into the LLM.
Actually, that’s probably expecting too much.
Eye2Recall: Exploring the Design of Enhancing Reminiscence Activities via Eye Tracking-Based LLM-Powered Interaction Experience for Older Adults
Lei Han, Mingnan Wei, Qiongyan Chen, Anqi Wang, Rong Pang, Kefei Liu, Rongrong Chen, David Yip
https://arxiv.org/abs/2508.02232
Claiming that LLMs bring us closer to AGI is like claiming that bullshitting brings one closer to wisdom.
Sure, you need "some" knowledge on different topics to bullshit successfully. Still, what's the point if all that knowledge is buried under an avalanche of lies? You probably can't distinguish what you knew from what you made up anymore.
#AI #LLM
LUST: A Multi-Modal Framework with Hierarchical LLM-based Scoring for Learned Thematic Significance Tracking in Multimedia Content
Anderson de Lima Luiz
https://arxiv.org/abs/2508.04353
Oh wow LLMs are just so terrible. 🤦♂️
I made a #systemd service watcher¹ ( :nixos: #NixOS module²³) which regularly feeds systemd status outputs into an LLM (mistral here) and sends me an email if it thinks it found real problems. Well, now I always get alarmist emails with bullshit warnings, suc…
I’ve written about design patterns for the securing of LLM agents: #AI
ViLLA-MMBench: A Unified Benchmark Suite for LLM-Augmented Multimodal Movie Recommendation
Fatemeh Nazary, Ali Tourani, Yashar Deldjoo, Tommaso Di Noia
https://arxiv.org/abs/2508.04206

ViLLA-MMBench: A Unified Benchmark Suite for LLM-Augmented Multimodal Movie Recommendation
Recommending long-form video content demands joint modeling of visual, audio, and textual modalities, yet most benchmarks address only raw features or narrow fusion. We present ViLLA-MMBench, a reproducible, extensible benchmark for LLM-augmented multimodal movie recommendation. Built on MovieLens and MMTF-14K, it aligns dense item embeddings from three modalities: audio (block-level, i-vector), visual (CNN, AVF), and text. Missing or sparse metadata is automatically enriched using state-of-the…
I appreciate the idea that you can respond to tickets really fast, but when it is obvious some sort of LLM is being used to respond and not answering the question asked, it is worse.
I'm looking at you https://clicks.tech
Refining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation Framework
Jialin Li, Jinzhe Li, Gengxu Li, Yi Chang, Yuan Wu
https://arxiv.org/abs/2508.03622
Honestly, I don't want to take a PhD for every LLM query that random people make. It was very exhausting to do just one.
So to summarize this whole adventure:
1. A good 45 minutes was spent to get an answer that we probably could have gotten in 5 minutes in the 2010's, or in maybe 1-2 hours in the 1990's.
2. The time investment wasn't a total waste as we learned a lot along the way that we wouldn't have in the 2010's. Most relevant is the wide range of variation (e.g. a 2x factor depending on fiber intake!).
3. Most of the search engine results were confidently wrong answers that had no relation to reality. We were lucky to get one that had real citations we could start from (but that same article included the bogus 4.91 kcal/gram number). Next time I want to know a random factoid I might just start on Google scholar.
4. At least one page we chased citations through had a note at the top about being frozen due to NIH funding issues. The digital commons is under attack on multiple fronts.
All of this is yet another reason not to support the big LLM companies.
#AI
Quality-of-Service Aware LLM Routing for Edge Computing with Multiple Experts
Jin Yang, Qiong Wu, Zhiying Feng, Zhi Zhou, Deke Guo, Xu Chen
https://arxiv.org/abs/2508.00234 http…
I am once again trying to fine-tune an LLM on 17 years worth of my internet poasts
Automated Validation of LLM-based Evaluators for Software Engineering Artifacts
Ora Nova Fandina, Eitan Farchi, Shmulik Froimovich, Rami Katan, Alice Podolsky, Orna Raz, Avi Ziv
https://arxiv.org/abs/2508.02827
LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations at eBay
Soumik Dey, Benjamin Braun, Naveen Ravipati, Hansi Wu, Binbin Li
https://arxiv.org/abs/2508.03628
AI agents = advanced malware that most of society decided is for some reason totally okay and chill and worth funding if it’s made by one of 3-4 tech giants
#AI #tech #LLM
Automated Algorithmic Discovery for Gravitational-Wave Detection Guided by LLM-Informed Evolutionary Monte Carlo Tree Search
He Wang, Liang Zeng
https://arxiv.org/abs/2508.03661
Risikomanagement und Resilienz in der IT-Sicherheit: IT-Sicherheitstag Dortmund
Am 16. September bietet das Programm der eintägigen Konferenz an der FH Dortmund spannende Inhalte für Forschung und Wirtschaft: Von Hacking bis LLM-Angriffen.
I keep seeing pull requests clearly generated by LLMs, but what’s really awkward is their inability to create separate branches and PRs for each fix, even after asking the contributor multiple times.
Can we conclude that git
is still out of reach for LLMs to really understand?
#git #llm
From Legacy to Standard: LLM-Assisted Transformation of Cybersecurity Playbooks into CACAO Format
Mehdi Akbari Gurabi, Lasse Nitz, Radu-Mihai Castravet, Roman Matzutt, Avikarsha Mandal, Stefan Decker
https://arxiv.org/abs/2508.03342
Jak ktoś chwali sobie #Claude #LLM, to wspomnę:
ClaudeBot dziś wykonał 20 tysięcy żądań do bugs.gentoo.org. Spośród nich, 15 tysięcy w kółko ciągnęło plik robots.txt. Zaprawdę wysokiej jakości kod.
#AI
Yeah, one of the two guys, a smart engineer, told me the llm answered him why the moon is always facing the same to earth, being the shape of the moon the cause of being locked in place.
A quick research on Wikipedia seems to confirm it was just wrong. It's because the orbital period was faster than earth's rotational period: https://
Denoising GER: A Noise-Robust Generative Error Correction with LLM for Speech Recognition
Yanyan Liu, Minqiang Xu, Yihao Chen, Liang He, Lei Fang, Sian Fang, Lin Liu
https://arxiv.org/abs/2509.04392
Investigating Gender Bias in LLM-Generated Stories via Psychological Stereotypes
Shahed Masoudian, Gustavo Escobedo, Hannah Strauss, Markus Schedl
https://arxiv.org/abs/2508.03292
Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation
James Mooney, Josef Woldense, Zheng Robert Jia, Shirley Anugrah Hayati, My Ha Nguyen, Vipul Raheja, Dongyeop Kang
https://arxiv.org/abs/2509.03736
Industrial LLM-based Code Optimization under Regulation: A Mixture-of-Agents Approach
Mari Ashiga, Vardan Voskanyan, Fateme Dinmohammadi, Jingzhi Gong, Paul Brookes, Matthew Truscott, Rafail Giavrimis, Mike Basios, Leslie Kanthan, Wei Jie
https://arxiv.org/abs/2508.03329
To whomever praises #Claude #LLM:
ClaudeBot has made 20k requests to bugs.gentoo.org today. 15k of them were repeatedly fetching robots.txt. That surely is a sign of great code quality.
#AI
Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations
Peng Lai, Jianjie Zheng, Sijie Cheng, Yun Chen, Peng Li, Yang Liu, Guanhua Chen
https://arxiv.org/abs/2508.03550
Control at Stake: Evaluating the Security Landscape of LLM-Driven Email Agents
Jiangrong Wu, Yuhong Nan, Jianliang Wu, Zitong Yao, Zibin Zheng
https://arxiv.org/abs/2507.02699
I really can’t think of AI agents as anything other than malware with articles like these:
#AI #tech #LLM
BitsAI-Fix: LLM-Driven Approach for Automated Lint Error Resolution in Practice
Yuanpeng Li, Qi Long, Zhiyuan Yao, Jian Xu, Lintao Xie, Xu He, Lu Geng, Xin Han, Yueyan Chen, Wenbo Duan
https://arxiv.org/abs/2508.03487
GUI-ReRank: Enhancing GUI Retrieval with Multi-Modal LLM-based Reranking
Kristian Kolthoff, Felix Kretzer, Christian Bartelt, Alexander Maedche, Simone Paolo Ponzetto
https://arxiv.org/abs/2508.03298
Efficient Item ID Generation for Large-Scale LLM-based Recommendation
Anushya Subbiah, Vikram Aggarwal, James Pine, Steffen Rendle, Krishna Sayana, Kun Su
https://arxiv.org/abs/2509.03746
EvoEmo: Towards Evolved Emotional Policies for LLM Agents in Multi-Turn Negotiation
Yunbo Long, Liming Xu, Lukas Beckenbauer, Yuhan Liu, Alexandra Brintrup
https://arxiv.org/abs/2509.04310
GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay
Yunan Zhang, Shuoran Jiang, Mengchen Zhao, Yuefeng Li, Yang Fan, Xiangping Wu, Qingcai Chen
https://arxiv.org/abs/2508.04676
Tuning LLM-based Code Optimization via Meta-Prompting: An Industrial Perspective
Jingzhi Gong, Rafail Giavrimis, Paul Brookes, Vardan Voskanyan, Fan Wu, Mari Ashiga, Matthew Truscott, Mike Basios, Leslie Kanthan, Jie Xu, Zheng Wang
https://arxiv.org/abs/2508.01443
FaST: Feature-aware Sampling and Tuning for Personalized Preference Alignment with Limited Data
Thibaut Thonet, Germ\'an Kruszewski, Jos Rozen, Pierre Erbacher, Marc Dymetman
https://arxiv.org/abs/2508.04698
LLM-based Relevance Assessment for Web-Scale Search Evaluation at Pinterest
Han Wang, Alex Whitworth, Pak Ming Cheung, Zhenjie Zhang, Krishna Kamath
https://arxiv.org/abs/2509.03764
Is LLM-Generated Code More Maintainable \& Reliable than Human-Written Code?
Alfred Santa Molison, Marcia Moraes, Glaucia Melo, Fabio Santos, Wesley K. G. Assuncao
https://arxiv.org/abs/2508.00700 …
FaMA: LLM-Empowered Agentic Assistant for Consumer-to-Consumer Marketplace
Yineng Yan, Xidong Wang, Jin Seng Cheng, Ran Hu, Wentao Guan, Nahid Farahmand, Hengte Lin, Yue Li
https://arxiv.org/abs/2509.03890
Leveraging LLM-Based Agents for Intelligent Supply Chain Planning
Yongzhi Qi, Jiaheng Yin, Jianshen Zhang, Dongyang Geng, Zhengyu Chen, Hao Hu, Wei Qi, Zuo-Jun Max Shen
https://arxiv.org/abs/2509.03811

Leveraging LLM-Based Agents for Intelligent Supply Chain Planning
In supply chain management, planning is a critical concept. The movement of physical products across different categories, from suppliers to warehouse management, to sales, and logistics transporting them to customers, entails the involvement of many entities. It covers various aspects such as demand forecasting, inventory management, sales operations, and replenishment. How to collect relevant data from an e-commerce platform's perspective, formulate long-term plans, and dynamically adjust the…
Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators
Dani Roytburg, Matthew Bozoukov, Matthew Nguyen, Jou Barzdukas, Simon Fu, Narmeen Oozeer
https://arxiv.org/abs/2509.03647
What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models
Pierre Le Coz, Jia An Liu, Debarun Bhattacharjya, Georgina Curto, Serge Stinckwich
https://arxiv.org/abs/2509.03827
ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory
Matthew Ho, Chen Si, Zhaoxiang Feng, Fangxu Yu, Zhijian Liu, Zhiting Hu, Lianhui Qin
https://arxiv.org/abs/2509.04439
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
Davide Paglieri, Bart{\l}omiej Cupia{\l}, Jonathan Cook, Ulyana Piterbarg, Jens Tuyls, Edward Grefenstette, Jakob Nicolaus Foerster, Jack Parker-Holder, Tim Rockt\"aschel
https://arxiv.org/abs/2509.03581
Efficient Agents: Building Effective Agents While Reducing Cost
Ningning Wang, Xavier Hu, Pai Liu, He Zhu, Yue Hou, Heyuan Huang, Shengyu Zhang, Jian Yang, Jiaheng Liu, Ge Zhang, Changwang Zhang, Jun Wang, Yuchen Eleanor Jiang, Wangchunshu Zhou
https://arxiv.org/abs/2508.02694
Do Role-Playing Agents Practice What They Preach? Belief-Behavior Consistency in LLM-Based Simulations of Human Trust
Amogh Mannekote, Adam Davies, Guohao Li, Kristy Elizabeth Boyer, ChengXiang Zhai, Bonnie J Dorr, Francesco Pinto
https://arxiv.org/abs/2507.02197