Tootfinder

Opt-in global Mastodon full text search. Join the index!

@hynek@mastodon.social
2025-08-28 05:01:49

"We simply don’t know to defend against these attacks. We have zero agentic AI systems that are secure against these attacks.[…] It’s an existential problem that, near as I can tell, most people developing these technologies are just pretending isn’t there."

@arXiv_csSE_bot@mastoxiv.page
2025-08-27 08:22:52

How do Humans and LLMs Process Confusing Code?
Youssef Abdelsalam, Norman Peitek, Anna-Maria Maurer, Mariya Toneva, Sven Apel
arxiv.org/abs/2508.18547

@arXiv_csAI_bot@mastoxiv.page
2025-08-28 07:31:00

Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs
Yao Fu, Xianxuan Long, Runchao Li, Haotian Yu, Mu Sheng, Xiaotian Han, Yu Yin, Pan Li
arxiv.org/abs/2508.19432

@arXiv_csPL_bot@mastoxiv.page
2025-08-27 07:55:33

A Case Study on the Effectiveness of LLMs in Verification with Proof Assistants
Bar{\i}\c{s} Bayaz{\i}t, Yao Li, Xujie Si
arxiv.org/abs/2508.18587

@heiseonline@social.heise.de
2025-08-27 13:08:00

Grammatikfehler machen Prompt Injections wahrscheinlicher
LLMs sind anfälliger für Prompt Injections oder einfaches Übergehen der Leitplanken, wenn man Fehler im Prompt macht.

@arXiv_csCL_bot@mastoxiv.page
2025-07-28 09:51:31

Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks
Kai Liu, Zhan Su, Peijie Dong, Fengran Mo, Jianfei Gao, ShaoTing Zhang, Kai Chen
arxiv.org/abs/2507.19353

@trochee@dair-community.social
2025-06-28 03:38:48

LLMs exhibit "potemkin understanding"!
Hope the methodology here is better than the last LLM-hater arxiv paper that came through
Must read it more carefully...
mathstodon.xyz/@gregeganSF/114

@arXiv_csCR_bot@mastoxiv.page
2025-08-27 09:55:42

LLMs in the SOC: An Empirical Study of Human-AI Collaboration in Security Operations Centres
Ronal Singh, Shahroz Tariq, Fatemeh Jalalvand, Mohan Baruwal Chhetri, Surya Nepal, Cecile Paris, Martin Lochner
arxiv.org/abs/2508.18947

@newsie@darktundra.xyz
2025-06-27 14:11:24

Fine-Tuning LLMs For ‘Good’ Behavior Makes Them More Likely To Say No 404media.co/fine-tuning-llms-c

@arXiv_csCY_bot@mastoxiv.page
2025-08-28 08:13:51

Should LLMs be WEIRD? Exploring WEIRDness and Human Rights in Large Language Models
Ke Zhou, Marios Constantinides, Daniele Quercia
arxiv.org/abs/2508.19269

@arXiv_csHC_bot@mastoxiv.page
2025-06-27 08:38:49

Multimodal LLMs for Visualization Reconstruction and Understanding
Can Liu, Chunlin Da, Xiaoxiao Long, Yuxiao Yang, Yu Zhang, Yong Wang
arxiv.org/abs/2506.21319

@simon_lucy@mastodon.social
2025-07-27 09:42:41

Something provoked me to think about SAM the 1970's news story comprehension and retelling system and the relationship with current LLMs. So I asked Grok, and ok it's on Twitter but it's easily shareable and open, the result is interesting.
x.com/Simon_Lucy/status/…

@arXiv_csDC_bot@mastoxiv.page
2025-06-27 09:08:49

ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks
Joshua H. Davis, Daniel Nichols, Ishan Khillan, Abhinav Bhatele
arxiv.org/abs/2506.20938

@arXiv_csIR_bot@mastoxiv.page
2025-07-28 08:52:51

SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation
Qian Dong, Jia Chen, Qingyao Ai, Hongning Wang, Haitao Li, Yi Wu, Yao Hu, Yiqun Liu, Shaoping Ma
arxiv.org/abs/2507.19033

@arXiv_csSE_bot@mastoxiv.page
2025-07-28 08:51:11

SESR-Eval: Dataset for Evaluating LLMs in the Title-Abstract Screening of Systematic Reviews
Aleksi Huotala, Miikka Kuutila, Mika M\"antyl\"a
arxiv.org/abs/2507.19027

@arXiv_csAI_bot@mastoxiv.page
2025-08-27 10:18:53

Reasoning LLMs in the Medical Domain: A Literature Survey
Armin Berger, Sarthak Khanna, David Berghaus, Rafet Sifa
arxiv.org/abs/2508.19097

@arXiv_csLG_bot@mastoxiv.page
2025-08-27 10:36:23

Understanding Tool-Integrated Reasoning
Heng Lin, Zhongwen Xu
arxiv.org/abs/2508.19201 arxiv.org/pdf/2508.19201

@Techmeme@techhub.social
2025-08-27 06:26:52

A profile of Egune AI, a startup building LLMs for the Mongolian language, as it navigates geopolitics, a lack of resources, and the nascent local tech scene (Viola Zhou/Rest of World)
restofworld.org/2025/mongolia-

@arXiv_eessSY_bot@mastoxiv.page
2025-08-28 09:15:41

Large Language Models (LLMs) for Electronic Design Automation (EDA)
Kangwei Xu, Denis Schwachhofer, Jason Blocklove, Ilia Polian, Peter Domanski, Dirk Pfl\"uger, Siddharth Garg, Ramesh Karri, Ozgur Sinanoglu, Johann Knechtel, Zhuorui Zhao, Ulf Schlichtmann, Bing Li
arxiv.org/abs/2508.20030

@inthehands@hachyderm.io
2025-05-28 16:32:31

Whatever LLMs and gen AI may or may not •actually• be good for, whatever jobs they may or may not actually reshape or displace, right now we’re in the middle of a bubble. The sheer amount of money involved makes it almost impossible to think clearly about this, much less have a useful public discussion. Even well-founded hopes and fears for the tech fuel a fire that I very much do not want to fuel.
8/

@arXiv_csCL_bot@mastoxiv.page
2025-08-27 10:21:53

It's All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLMs
Yue Li, Zhixue Zhao, Carolina Scarton
arxiv.org/abs/2508.19089

@pavelasamsonov@mastodon.social
2025-06-28 18:54:21

Techies are always chasing the mythical tool that will let them "focus on the work" and avoid tedious distractions like "talking to people." AI tools are only the latest to promise this impossible dream.
But talking to people IS the work. You can complain about it on the internet, or take responsibility and make your life a lot easier.
#LLM

@arXiv_csCV_bot@mastoxiv.page
2025-08-25 09:44:40

SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
Yicheng Ji, Jun Zhang, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, Huan Li
arxiv.org/abs/2508.16201

@arXiv_csCR_bot@mastoxiv.page
2025-08-28 09:19:41

An Investigation on Group Query Hallucination Attacks
Kehao Miao, Xiaolong Jin
arxiv.org/abs/2508.19321 arxiv.org/pdf/2508.19321

@gedankenstuecke@scholar.social
2025-05-28 07:01:34

«the idea of LLMs as a technology that's largely parasitic on status games in a crumbling society does a great deal to explain the recent spate of AI use mandates in businesses. While they're obviously, transparently bad for materially getting work done, they're amazing status boosters, and if people are willing to take on material harm in order to improve their relative status, this behaviour actually makes considerable sense.»
Keeping up appearances | deadSimpleTech
deadsimpletech.com/blog/keepin

@mia@hcommons.social
2025-05-28 14:24:46

Thoughts on running local, privacy aware LLMs and other AI/ML via GPT4All, PrivateGPT, OLMo 2 or Ollama?
I have an M3 Macbook Air and limited capacity for faff right now

@arXiv_csCY_bot@mastoxiv.page
2025-06-27 07:35:18

Our Coding Adventure: Using LLMs to Personalise the Narrative of a Tangible Programming Robot for Preschoolers
Martin Ruskov
arxiv.org/abs/2506.20982

@arXiv_csRO_bot@mastoxiv.page
2025-08-27 09:40:52

An LLM-powered Natural-to-Robotic Language Translation Framework with Correctness Guarantees
ZhenDong Chen, ZhanShang Nie, ShiXing Wan, JunYi Li, YongTian Cheng, Shuai Zhao
arxiv.org/abs/2508.19074

@adlerweb@social.adlerweb.info
2025-07-27 18:28:26

LLMs sind ja auch eher Incompetent-ass-kissing-as-a-service.

@ian@phpc.social
2025-08-27 13:21:52

LLMs are by definition stick in the past. MCP and other tool use to call the latest version of Go tooling can combat this, e.g. suggesting a refactor to take advantage of new Go features. #GophersUnite

@andres4ny@social.ridetrans.it
2025-06-26 20:30:52

Ah fuck, people are using LLMs for kernel code. They really are going to fuck over everything, aren't they?
lwn.net/SubscriberLink/1026558

comment from "comex", in a thread discussing a mistake in an LLM-generated commit:

"(Disclaimer: I am not sashal.)

…In other words, you're saying that the patch is buggy because it drops the __read_mostly attribute (which places the data in a different section).

That's a good reminder of how untrustworthy LLMs still are. Even for such a simple patch, the LLM was still able to make a subtle mistake.

To be fair, a human could definitely make the same mistake. And whatever humans revie…
Comment by "adobriyan" showing the commit in question, which replaces a "struct hlist_head event_hash[EVENT_HASHSIZE] __read_mostly" with "DEFINE_HASHTABLE(event_hash, EVENT_HASH_BITS)"
@Mediagazer@mstdn.social
2025-08-28 06:31:04

A test of seven AI chatbots' abilities to identify news photos' location, date, and photographer showed all failed to consistently identify photos' provenance (Columbia Journalism Review)
cjr.org/tow_center/why-ai-mode

@arXiv_csPL_bot@mastoxiv.page
2025-05-28 07:20:49

LEGO-Compiler: Enhancing Neural Compilation Through Translation Composability
Shuoming Zhang, Jiacheng Zhao, Chunwei Xia, Zheng Wang, Yunji Chen, Xiaobing Feng, Huimin Cui
arxiv.org/abs/2505.20356

@aredridel@kolektiva.social
2025-06-27 21:54:18

The popular meaning of "luddite" is a straw-man. It's a sloppy word with a sloppy meaning now, and it's one we'd do well to watch out for.
The actual reality of who the Luddites were is far more interesting, the center of the hard-fought struggles against owners of factories disrupting entire towns and cities economies with massively terrible results, centralizing power and money and leaving a great number of people without any control of their work, formerly artisans who'd had a hand in their own work, and many automated out of jobs. Luddites destroyed automated looms not because they hated technology. They destroyed automated looms because they were taking the livelihood they depended on, with no recourse, and it was a disaster for a good while, and then millwork has gone from those places probably forever.
The problem now with LLMs and automated research systems is there's very little way for workers and creators to stick their shoes in the machinery. They've tried (arxiv.org/abs/2407.12281) but mostly failed, since unlike a factory full of textile workers, the equipment is remote, the automation virtual, an intangible software object that few can access in any meaningful way.

@arXiv_csDB_bot@mastoxiv.page
2025-07-28 08:15:01

DBMS-LLM Integration Strategies in Industrial and Business Applications: Current Status and Future Challenges
Zhengtong Yan, Gongsheng Yuan, Qingsong Guo, Jiaheng Lu
arxiv.org/abs/2507.19254

@arXiv_csGT_bot@mastoxiv.page
2025-08-27 08:39:33

Bias-Adjusted LLM Agents for Human-Like Decision-Making via Behavioral Economics
Ayato Kitadai, Yusuke Fukasawa, Nariaki Nishino
arxiv.org/abs/2508.18600

@escap@azapft.is
2025-06-28 15:21:18

Spiele seit ein paar Wochen intensiv mit LLMs rum und es ist schon faszinierend, was das so mit der Arbeitsweise/Alltag macht.
Es macht endlich wieder Spaß, Dinge zu recherchieren und Ideen auszuarbeiten. WWW per klassischer Suchmaschine macht schon länger keinen Spaß mehr...
Liegt nicht nur am generierten Schrott, sondern auch an Websites, die so vollgestopft sind, das sie nicht durch die mobile Leitung im Zug passen ;)
LLM filtert und komprimiert :)

@arXiv_econGN_bot@mastoxiv.page
2025-05-28 07:22:49

When Experimental Economics Meets Large Language Models: Tactics with Evidence
Shu Wang, Zijun Yao, Shuhuai Zhang, Jianuo Gai, Tracy Xiao Liu, Songfa Zhong
arxiv.org/abs/2505.21371

@theDuesentrieb@social.linux.pizza
2025-06-27 08:09:07

A highly overlooked perspective, about the current AI discussion, where AI and LLMs are treated as synonyms and so hype and money is funnled into highly inefficient pseudo-solutions when sometimes just a bunch of if-else statements or a state machine would suffice
youtube.com/watch?v=pnNW4_1DqF

@arXiv_csSE_bot@mastoxiv.page
2025-07-28 09:18:31

ReCatcher: Towards LLMs Regression Testing for Code Generation
Altaf Allah Abbassi, Leuson Da Silva, Amin Nikanjam, Foutse Khomh
arxiv.org/abs/2507.19390

@heiseonline@social.heise.de
2025-08-27 16:15:09

Etwas mehr der heute besonders häufig geteilten #News:
Grammatikfehler machen Prompt Injections wahrscheinlicher

@arXiv_csAI_bot@mastoxiv.page
2025-08-27 10:16:03

Can Structured Templates Facilitate LLMs in Tackling Harder Tasks? : An Exploration of Scaling Laws by Difficulty
Zhichao Yang, Zhaoxin Fan, Gen Li, Yuanze Hu, Xinyu Wang, Ye Qiu, Xin Wang, Yifan Sun, Wenjun Wu
arxiv.org/abs/2508.19069

@arXiv_csCL_bot@mastoxiv.page
2025-08-27 10:23:23

Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning
Alan Li, Yixin Liu, Arpan Sarkar, Doug Downey, Arman Cohan
arxiv.org/abs/2508.19202

@arXiv_csIR_bot@mastoxiv.page
2025-07-28 09:18:11

Injecting External Knowledge into the Reasoning Process Enhances Retrieval-Augmented Generation
Minghao Tang, Shiyu Ni, Jiafeng Guo, Keping Bi
arxiv.org/abs/2507.19333

@arXiv_csLG_bot@mastoxiv.page
2025-08-27 10:35:23

APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM Acceleration
Shaobo Ma, Chao Fang, Haikuo Shao, Zhongfeng Wang
arxiv.org/abs/2508.19087

@inthehands@hachyderm.io
2025-05-28 16:32:31

Whatever LLMs and gen AI may or may not •actually• be good for, whatever jobs they may or may not actually reshape or displace, right now we’re in the middle of a bubble. The sheer amount of money involved makes it almost impossible to think clearly about this, much less have a useful public discussion. Even well-founded hopes and fears for the tech fuel a fire that I very much do not want to fuel.
8/

@Techmeme@techhub.social
2025-06-28 01:45:54

The term "context engineering" is gaining traction over "prompt engineering" as it better describes the skill of providing LLMs with the necessary information (Simon Willison/Simon Willison's Weblog)
simonwillison.net/2025/Jun/27/

@arXiv_csCR_bot@mastoxiv.page
2025-08-28 09:16:01

Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience
Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Bin Ji, Jun Ma, Xiaodong Liu, Jing Wang, Feilong Bao, Jianfeng Zhang, Baosheng Wang, Jie Yu
arxiv.org/abs/2508.19292

@gedankenstuecke@scholar.social
2025-05-28 07:01:34

«the idea of LLMs as a technology that's largely parasitic on status games in a crumbling society does a great deal to explain the recent spate of AI use mandates in businesses. While they're obviously, transparently bad for materially getting work done, they're amazing status boosters, and if people are willing to take on material harm in order to improve their relative status, this behaviour actually makes considerable sense.»
Keeping up appearances | deadSimpleTech
deadsimpletech.com/blog/keepin

@arXiv_csAI_bot@mastoxiv.page
2025-08-28 08:42:31

ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
Sining Zhoubian, Dan Zhang, Yuxiao Dong, Jie Tang
arxiv.org/abs/2508.19576

@arXiv_csDC_bot@mastoxiv.page
2025-06-27 09:23:59

BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services
Zhaojiacheng Zhou, Hongze Liu, Shijing Yuan, Hanning Zhang, Jiong Lou, Chentao Wu, Jie Li
arxiv.org/abs/2506.21033

@ian@phpc.social
2025-08-27 18:38:13

By default, common LLMs don't use current code features, and even higher-power ones are a coin flip. Go analysis tools, and the associated MCP server, helps push back against this so emitted code gets modernized on the way out of the model. #GophersUnite

@arXiv_csCL_bot@mastoxiv.page
2025-08-27 10:00:33

Filtering for Creativity: Adaptive Prompting for Multilingual Riddle Generation in LLMs
Duy Le, Kent Ziti, Evan Girard-Sun, Sean O'Brien, Vasu Sharma, Kevin Zhu
arxiv.org/abs/2508.18709

@arXiv_csSE_bot@mastoxiv.page
2025-08-28 09:19:11

Leveraging LLMs for Automated Translation of Legacy Code: A Case Study on PL/SQL to Java Transformation
Lola Solovyeva, Eduardo Carneiro Oliveira, Shiyu Fan, Alper Tuncay, Shamil Gareev, Andrea Capiluppi
arxiv.org/abs/2508.19663

@escap@azapft.is
2025-06-28 15:23:14

2025 fühlt sich echt spät an für einen Informatiker, sich mit LLMs/AI zu beschäftigen, aber irgendwie war ich 2009/12 schon dabei, damals an der Uni
Ist ganz gut, die "Vorgeschichte" zu kennen und dann den ersten Hype-Buckel abzuwarten ;) Nicht geplant, aber gut gelaufen für mich, habe ich das Gefühl.

@arXiv_csAI_bot@mastoxiv.page
2025-08-26 08:42:06

Integrating Time Series into LLMs via Multi-layer Steerable Embedding Fusion for Enhanced Forecasting
Zhuomin Chen, Dan Li, Jiahui Zhou, Shunyu Wu, Haozheng Ye, Jian Lou, See-Kiong Ng
arxiv.org/abs/2508.16059

@arXiv_csPL_bot@mastoxiv.page
2025-07-28 07:43:41

IsaMini: Redesigned Isabelle Proof Lanugage for Machine Learning
Qiyuan Xu, Renxi Wang, Haonan Li, David Sanan, Conrad Watt
arxiv.org/abs/2507.18885

@arXiv_csLG_bot@mastoxiv.page
2025-08-25 09:54:30

Boardwalk: Towards a Framework for Creating Board Games with LLMs
\'Alvaro Guglielmin Becker, Gabriel Bauer de Oliveira, Lana Bertoldo Rossato, Anderson Rocha Tavares
arxiv.org/abs/2508.16447

@arXiv_csCL_bot@mastoxiv.page
2025-08-27 10:16:23

ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Yibo Li, Miao Xiong, Jiaying Wu, Bryan Hooi
arxiv.org/abs/2508.18847

@arXiv_csSE_bot@mastoxiv.page
2025-07-28 08:54:41

Exploring the Use of LLMs for Requirements Specification in an IT Consulting Company
Liliana Pasquale, Azzurra Ragone, Emanuele Piemontese, Armin Amiri Darban
arxiv.org/abs/2507.19113

@arXiv_csCR_bot@mastoxiv.page
2025-08-28 08:59:51

RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting
Zhan Shi, Yefeng Yuan, Yuhong Liu, Liang Cheng, Yi Fang
arxiv.org/abs/2508.19286

@arXiv_csAI_bot@mastoxiv.page
2025-06-27 07:39:28

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?
Haoang Chi, He Li, Wenjing Yang, Feng Liu, Long Lan, Xiaoguang Ren, Tongliang Liu, Bo Han
arxiv.org/abs/2506.21215

@arXiv_csIR_bot@mastoxiv.page
2025-07-28 09:15:11

Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation
Hengran Zhang, Keping Bi, Jiafeng Guo, Jiaming Zhang, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng
arxiv.org/abs/2507.19102

@heiseonline@social.heise.de
2025-08-20 13:04:00

KI-Update kompakt: LLMs für Malware, Anti-Human-Bias, Sutton, Chatbots
Das "KI-Update" liefert werktäglich eine Zusammenfassung der wichtigsten KI-Entwicklungen.

@arXiv_csSE_bot@mastoxiv.page
2025-08-27 09:06:02

Interleaving Large Language Models for Compiler Testing
Yunbo Ni, Shaohua Li
arxiv.org/abs/2508.18955 arxiv.org/pdf/2508.18955

@arXiv_csCR_bot@mastoxiv.page
2025-08-27 09:43:13

FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation
Shaswata Mitra, Azim Bazarov, Martin Duclos, Sudip Mittal, Aritran Piplai, Md Rayhanur Rahman, Edward Zieglar, Shahram Rahimi
arxiv.org/abs/2508.18684

@arXiv_csCL_bot@mastoxiv.page
2025-08-26 12:13:26

From BERT to LLMs: Comparing and Understanding Chinese Classifier Prediction in Language Models
ZiqiZhang, Jianfei Ma, Emmanuele Chersoni, Jieshun You, Zhaoxin Feng
arxiv.org/abs/2508.18253

@arXiv_csCR_bot@mastoxiv.page
2025-08-27 08:15:43

A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs
Anders M{\o}lmen H{\o}st, Pierre Lison, Leon Moonen
arxiv.org/abs/2508.18439

@arXiv_csAI_bot@mastoxiv.page
2025-08-26 08:57:26

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
Zizhen Li, Chuanhao Li, Yibin Wang, Qi Chen, Diping Song, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Kaipeng Zhang
arxiv.org/abs/2508.16072

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 07:31:50

CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation
Deepon Halder, Thanmay Jayakumar, Raj Dabre
arxiv.org/abs/2506.19952

@arXiv_csAI_bot@mastoxiv.page
2025-08-25 07:40:30

Integrating Time Series into LLMs via Multi-layer Steerable Embedding Fusion for Enhanced Forecasting
Zhuomin Chen, Dan Li, Jiahui Zhou, Shunyu Wu, Haozheng Ye, Jian Lou, See-Kiong Ng
arxiv.org/abs/2508.16059

@arXiv_csSE_bot@mastoxiv.page
2025-07-28 08:37:01

MemoCoder: Automated Function Synthesis using LLM-Supported Agents
Yiping Jia, Zhen Ming Jiang, Shayan Noei, Ying Zou
arxiv.org/abs/2507.18812

@arXiv_csCL_bot@mastoxiv.page
2025-08-27 09:42:42

Thinking Before You Speak: A Proactive Test-time Scaling Approach
Cong Li, Wenchang Chai, Hejun Wu, Yan Pan, Pengxu Wei, Liang Lin
arxiv.org/abs/2508.18648

@arXiv_csCR_bot@mastoxiv.page
2025-08-28 09:49:01

SoK: Large Language Model Copyright Auditing via Fingerprinting
Shuo Shao, Yiming Li, Yu He, Hongwei Yao, Wenyuan Yang, Dacheng Tao, Zhan Qin
arxiv.org/abs/2508.19843

@arXiv_csAI_bot@mastoxiv.page
2025-08-28 09:20:01

Tracking World States with Language Models: State-Based Evaluation Using Chess
Romain Harang, Jason Naradowsky, Yaswitha Gujju, Yusuke Miyao
arxiv.org/abs/2508.19851

@arXiv_csCL_bot@mastoxiv.page
2025-08-27 10:24:13

Generative Interfaces for Language Models
Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, Diyi Yang
arxiv.org/abs/2508.19227 arxiv.org/…

@arXiv_csSE_bot@mastoxiv.page
2025-08-27 07:37:02

Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
Terry Yue Zhuo, Dingmin Wang, Hantian Ding, Varun Kumar, Zijian Wang
arxiv.org/abs/2508.18370

@arXiv_csCR_bot@mastoxiv.page
2025-08-28 09:50:11

Disabling Self-Correction in Retrieval-Augmented Generation via Stealthy Retriever Poisoning
Yanbo Dai, Zhenlan Ji, Zongjie Li, Kuan Li, Shuai Wang
arxiv.org/abs/2508.20083

@arXiv_csAI_bot@mastoxiv.page
2025-08-27 10:13:33

Sense of Self and Time in Borderline Personality. A Comparative Robustness Study with Generative AI
Marcin Moskalewicz, Anna Sterna, Marek Pokropski, Paula Flores
arxiv.org/abs/2508.19008

@arXiv_csCL_bot@mastoxiv.page
2025-07-28 09:58:01

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab
arx…

@arXiv_csSE_bot@mastoxiv.page
2025-08-27 08:32:12

Requirements Development and Formalization for Reliable Code Generation: A Multi-Agent Vision
Xu Lu, Weisong Sun, Yiran Zhang, Ming Hu, Cong Tian, Zhi Jin, Yang Liu
arxiv.org/abs/2508.18675

@arXiv_csAI_bot@mastoxiv.page
2025-08-27 10:08:53

Interactive Evaluation of Large Language Models for Multi-Requirement Software Engineering Tasks
Dimitrios Rontogiannis, Maxime Peyrard, Nicolas Baldwin, Martin Josifoski, Robert West, Dimitrios Gunopulos
arxiv.org/abs/2508.18905

@arXiv_csCL_bot@mastoxiv.page
2025-08-27 09:47:33

Breaking the Trade-Off Between Faithfulness and Expressiveness for Large Language Models
Chenxu Yang, Qingyi Si, Zheng Lin
arxiv.org/abs/2508.18651

@arXiv_csCR_bot@mastoxiv.page
2025-08-25 09:14:40

Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs
Yu Yan, Sheng Sun, Zhe Wang, Yijun Lin, Zenghao Duan, zhifei zheng, Min Liu, Zhiyi yin, Jianping Zhang
arxiv.org/abs/2508.16347

@arXiv_csCL_bot@mastoxiv.page
2025-07-28 09:47:51

Identifying Fine-grained Forms of Populism in Political Discourse: A Case Study on Donald Trump's Presidential Campaigns
Ilias Chalkidis, Stephanie Brandl, Paris Aslanidis
arxiv.org/abs/2507.19303

@arXiv_csSE_bot@mastoxiv.page
2025-06-26 08:22:10

Can LLMs Replace Humans During Code Chunking?
Christopher Glasz, Emily Escamilla, Eric O. Scott, Anand Patel, Jacob Zimmer, Colin Diggs, Michael Doyle, Scott Rosen, Nitin Naik, Justin F. Brunelle, Samruddhi Thaker, Parthav Poudel, Arun Sridharan, Amit Madan, Doug Wendt, William Macke, Thomas Schill
arxiv.org/abs/2506.198…

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:53:49

Structuralist Approach to AI Literary Criticism: Leveraging Greimas Semiotic Square for Large Language Models
Fangzhou Dong, Yifan Zeng, Yingpeng Sang, Hong Shen
arxiv.org/abs/2506.21360

@arXiv_csCR_bot@mastoxiv.page
2025-06-26 09:40:10

JsDeObsBench: Measuring and Benchmarking LLMs for JavaScript Deobfuscation
Guoqiang Chen, Xin Jin, Zhiqiang Lin
arxiv.org/abs/2506.20170

@arXiv_csCL_bot@mastoxiv.page
2025-08-27 10:10:43

Harnessing Rule-Based Reinforcement Learning for Enhanced Grammatical Error Correction
Yilin Li, Xunjian Yin, Yilin Chen, Xiaojun Wan
arxiv.org/abs/2508.18780

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 10:00:09

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets
Akshay Paruchuri, Maryam Aziz, Rohit Vartak, Ayman Ali, Best Uchehara, Xin Liu, Ishan Chatterjee, Monica Agrawal
arxiv.org/abs/2506.21532

@arXiv_csAI_bot@mastoxiv.page
2025-08-25 08:04:10

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
Zizhen Li, Chuanhao Li, Yibin Wang, Qi Chen, Diping Song, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Kaipeng Zhang
arxiv.org/abs/2508.16072

@arXiv_csCR_bot@mastoxiv.page
2025-06-26 09:21:00

Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models
Wanli Peng, Xin Chen, Hang Fu, XinYu He, Xue Yiming, Juan Wen
arxiv.org/abs/2506.19889

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:56:19

Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection
Ali \c{S}enol, Garima Agrawal, Huan Liu
arxiv.org/abs/2506.21443 arxiv.org/pdf/2506.21443 arxiv.org/html/2506.21443
arXiv:2506.21443v1 Announce Type: new
Abstract: Detecting deceptive conversations on dynamic platforms is increasingly difficult due to evolving language patterns and Concept Drift (CD)\-i.e., semantic or topical shifts that alter the context or intent of interactions over time. These shifts can obscure malicious intent or mimic normal dialogue, making accurate classification challenging. While Large Language Models (LLMs) show strong performance in natural language tasks, they often struggle with contextual ambiguity and hallucinations in risk\-sensitive scenarios. To address these challenges, we present a Domain Knowledge (DK)\-Enhanced LLM framework that integrates pretrained LLMs with structured, task\-specific insights to perform fraud and concept drift detection. The proposed architecture consists of three main components: (1) a DK\-LLM module to detect fake or deceptive conversations; (2) a drift detection unit (OCDD) to determine whether a semantic shift has occurred; and (3) a second DK\-LLM module to classify the drift as either benign or fraudulent. We first validate the value of domain knowledge using a fake review dataset and then apply our full framework to SEConvo, a multiturn dialogue dataset that includes various types of fraud and spam attacks. Results show that our system detects fake conversations with high accuracy and effectively classifies the nature of drift. Guided by structured prompts, the LLaMA\-based implementation achieves 98\% classification accuracy. Comparative studies against zero\-shot baselines demonstrate that incorporating domain knowledge and drift awareness significantly improves performance, interpretability, and robustness in high\-stakes NLP applications.
toXiv_bot_toot

@arXiv_csCL_bot@mastoxiv.page
2025-08-26 12:02:06

Debiasing Multilingual LLMs in Cross-lingual Latent Space
Qiwei Peng, Guimin Hu, Yekun Chai, Anders S{\o}gaard
arxiv.org/abs/2508.17948 arx…

@arXiv_csCL_bot@mastoxiv.page
2025-08-26 12:05:16

Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering
Julius Gun, Timo Oksanen
arxiv.org/abs/2508.18093

@arXiv_csCL_bot@mastoxiv.page
2025-08-26 12:04:36

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Khaoula Chehbouni, Mohammed Haddou, Jackie Chi Kit Cheung, Golnoosh Farnadi
arxiv.org/abs/2508.18076

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 09:40:50

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
Ammar Khairi, Daniel D'souza, Ye Shen, Julia Kreutzer, Sara Hooker
arxiv.org/abs/2506.20544

@arXiv_csCL_bot@mastoxiv.page
2025-06-27 09:59:49

Potemkin Understanding in Large Language Models
Marina Mancoridis, Bec Weeks, Keyon Vafa, Sendhil Mullainathan
arxiv.org/abs/2506.21521 arxiv.org/pdf/2506.21521 arxiv.org/html/2506.21521
arXiv:2506.21521v1 Announce Type: new
Abstract: Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.
toXiv_bot_toot

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 09:47:00

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs
Sonia K. Murthy, Rosie Zhao, Jennifer Hu, Sham Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman
arxiv.org/abs/2506.20666

@arXiv_csCL_bot@mastoxiv.page
2025-06-26 08:46:30

SEED: A Structural Encoder for Embedding-Driven Decoding in Time Series Prediction with LLMs
Fengze Li, Yue Wang, Yangle Liu, Ming Huang, Dou Hong, Jieming Ma
arxiv.org/abs/2506.20167