
2025-06-04 15:32:47
This https://arxiv.org/abs/2505.07453 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
Purbesh Mitra, Sennur Ulukus
https://arxiv.org/abs/2507.02851 https://a…
Early Signs of Steganographic Capabilities in Frontier LLMs
Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner
https://arxiv.org/abs/2507.02737
This https://arxiv.org/abs/2505.20730 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csIR_…
CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking
Neeva Oza, Ishaan Govil, Parul Gupta, Dinesh Khandelwal, Dinesh Garg, Parag Singla
https://arxiv.org/abs/2506.04019
LLMs are starving for knowledge graphs. Raphael Troncy was pointing out that many LLM company crawlers are constantly visiting their KGs. Some crawlers even perform explicit SPARQL queries on the KGs.
#knowledgegraphs #eswc2025
This https://arxiv.org/abs/2506.02965 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
CLONE: Customizing LLMs for Efficient Latency-Aware Inference at the Edge
Chunlin Tian, Xinpeng Qin, Kahou Tam, Li Li, Zijian Wang, Yuanzhe Zhao, Minglei Zhang, Chengzhong Xu
https://arxiv.org/abs/2506.02847
This https://arxiv.org/abs/2501.04901 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csDB_…
This https://arxiv.org/abs/2503.16456 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csHC_…
NestedFP: High-Performance, Memory-Efficient Dual-Precision Floating Point Support for LLMs
Haeun Lee, Omin Kwon, Yeonhong Park, Jae W. Lee
https://arxiv.org/abs/2506.02024
#Python Friday #277: Access Local #LLMs Through LM Studio
https://pythonfriday.dev/2025/05/277-a
I’ve written about automating away some boring part of parenthood with LLMs and AppleScript
#apple
'Failure Imminent': When LLMs In a Long-Running Vending Business Simulation Went Berserk
https://slashdot.org/story/25/05/31/2112240/failure-imminent-when-llms-in-a-long-running-vending-business-simulati…
This is such a perfect analogy.
My goto is "asbestos". Super useful invention which bit us in the ass afterwards.
https://xoxo.zone/@annika/114614639082253074
Anyone has the impression that virtually all LLMs use a sort of "hyper-allistic" language?
As if we had a spectrum for allism disorder and LLMs were an extreme case of it.
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
Jiaer Xia, Bingkui Tong, Yuhang Zang, Rui Shao, Kaiyang Zhou
https://arxiv.org/abs/2507.02859
Am I the only one who foresees the future #AI business model as enshittified ad-infused LLMs? Once LLMs are ingrained in every class and board room, you‘ll suddenly have to pay big bucks while the free plans will be riddled with ads. It‘ll be like that #BlackMirror episode where the teacher spews comme…
On the Convergence of Large Language Model Optimizer for Black-Box Network Management
Hoon Lee, Wentao Zhou, Merouane Debbah, Inkyu Lee
https://arxiv.org/abs/2507.02689
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs
Ken Tsui
https://arxiv.org/abs/2507.02778 https://
This https://arxiv.org/abs/2501.07071 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
A post from the archive 📫:
If LLMs Can Code, Why Are We Building More IDEs?
https://www.poppastring.com/blog/if-llms-can-code-why-are-we-building-more-ides
This https://arxiv.org/abs/2505.19145 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_sta…
A former employee says fewer than 10,000 people use Ola Krutrim's LLM chatbot, which supports 10 Indian languages, and that over 60% of them are random testers (Swathi Moorthy/The Economic Times)
https://
This https://arxiv.org/abs/2405.08965 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csPL_…
This https://arxiv.org/abs/2407.04503 has been replaced.
initial toot: https://mastoxiv.page/@arX…
This https://arxiv.org/abs/2505.20573 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
If LLMs were so good at writing code, they wouldn’t need a new thought leader yelling about them every day.
They might be. At this point, I do not care. Lots of people (including, most recently, Ptacek, Yegge, etc.) are trying to sell me something and I have no interest in listening.
If your thing is good, show, don’t tell.
But it’s not, is it?
These articles… you’re not trying to convince me, you’re trying to convince yourselves.
So please: keep them to yoursel…
This https://arxiv.org/abs/2506.00095 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCY_…
When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search
William A. Ingram, Bipasha Banerjee, Edward A. Fox
https://arxiv.org/abs/2507.02139 …
This https://arxiv.org/abs/2505.24298 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage
Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou
https://arxiv.org/abs/2507.02332
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen
https://arxiv.org/abs/2506.03930
"LLMs are okay at coding, but at scale they build jumbled messes. I’ve scaled back my use of AI when coding and gone back to using my brain and pen and paper."
https://albertofortin.com/writing/coding-with-ai
Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs
Mohammad Ali Alomrani, Yingxue Zhang, Derek Li, Qianyi Sun, Soumyasundar Pal, Zhanguang Zhang, Yaochen Hu, Rohan Deepak Ajwani, Antonios Valkanas, Raika Karimi, Peng Cheng, Yunzhou Wang, Pengyi Liao, Hanrui Huang, Bin Wang, Jianye Hao, Mark Coates
https://
Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers
Zhijian Xu, Yilun Zhao, Manasi Patwardhan, Lovekesh Vig, Arman Cohan
https://arxiv.org/abs/2507.02694
Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning
Xunzhu Tang, Jacques Klein, Tegawend\'e F. Bissyand\'e
https://arxiv.org/abs/2506.03921
MGC: A Compiler Framework Exploiting Compositional Blindness in Aligned LLMs for Malware Generation
Lu Yan, Zhuo Zhang, Xiangzhe Xu, Shengwei An, Guangyu Shen, Zhou Xuan, Xuan Chen, Xiangyu Zhang
https://arxiv.org/abs/2507.02057
ReTern: Exploiting Natural Redundancy and Sign Transformations for Enhanced Fault Tolerance in Compute-in-Memory based Ternary LLMs
Akul Malhotra, Sumeet Kumar Gupta
https://arxiv.org/abs/2506.01140
Sampling Preferences Yields Simple Trustworthiness Scores
Sean Steinle
https://arxiv.org/abs/2506.03399 https://arxiv.org/pdf/2506.03…
This https://arxiv.org/abs/2506.01538 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csRO_…
This https://arxiv.org/abs/2506.00095 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCY_…
Data Diversification Methods In Alignment Enhance Math Performance In LLMs
Berkan Dokmeci, Qingyang Wu, Ben Athiwaratkun, Ce Zhang, Shuaiwen Leon Song, James Zou
https://arxiv.org/abs/2507.02173
This https://arxiv.org/abs/2505.19433 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
GORACS: Group-level Optimal Transport-guided Coreset Selection for LLM-based Recommender Systems
Tiehua Mei, Hengrui Chen, Peng Yu, Jiaqing Liang, Deqing Yang
https://arxiv.org/abs/2506.04015
Fault Localisation and Repair for DL Systems: An Empirical Study with LLMs
Jinhan Kim, Nargiz Humbatova, Gunel Jahangirova, Shin Yoo, Paolo Tonella
https://arxiv.org/abs/2506.03396
The Thin Line Between Comprehension and Persuasion in LLMs
Adrian de Wynter, Tangming Yuan
https://arxiv.org/abs/2507.01936 https://a…
BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage
Kalyan Nakka, Nitesh Saxena
https://arxiv.org/abs/2506.02479
This https://arxiv.org/abs/2406.13945 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
This https://arxiv.org/abs/2503.18792 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csHC_…
The World As Large Language Models See It: Exploring the reliability of LLMs in representing geographical features
Omid Reza Abbasi, Franz Welscher, Georg Weinberger, Johannes Scholz
https://arxiv.org/abs/2506.00203
This https://arxiv.org/abs/2506.00486 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
ProRank: Prompt Warmup via Reinforcement Learning for Small Language Models Reranking
Xianming Li, Aamir Shakir, Rui Huang, Julius Lipp, Jing Li
https://arxiv.org/abs/2506.03487
This https://arxiv.org/abs/2401.16310 has been replaced.
link: https://scholar.google.com/scholar?q=a
This https://arxiv.org/abs/2506.02658 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
Comparative analysis of privacy-preserving open-source LLMs regarding extraction of diagnostic information from clinical CMR imaging reports
Sina Amirrajab, Volker Vehof, Michael Bietenbeck, Ali Yilmaz
https://arxiv.org/abs/2506.00060
Misaligned from Within: Large Language Models Reproduce Our Double-Loop Learning Blindness
Tim Rogers, Ben Teehankee
https://arxiv.org/abs/2507.02283 https…
This https://arxiv.org/abs/2505.03793 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csLG_…
Benford's Curse: Tracing Digit Bias to Numerical Hallucination in LLMs
Jiandong Shao, Yao Lu, Jianfei Yang
https://arxiv.org/abs/2506.01734 https://
This https://arxiv.org/abs/2404.16873 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
This https://arxiv.org/abs/2412.13147 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
Computational Thinking Reasoning in Large Language Models
Kechi Zhang, Ge Li, Jia Li, Huangzhao Zhang, Jingjing Xu, Hao Zhu, Lecheng Wang, Jia Li, Yihong Dong, Jing Mai, Bin Gu, Zhi Jin
https://arxiv.org/abs/2506.02658
Evaluating Prompt Engineering Techniques for Accuracy and Confidence Elicitation in Medical LLMs
Nariman Naderi, Zahra Atf, Peter R Lewis, Aref Mahjoub far, Seyed Amir Ahmad Safavi-Naini, Ali Soroush
https://arxiv.org/abs/2506.00072
Multimodal Mathematical Reasoning with Diverse Solving Perspective
Wenhao Shi, Zhiqiang Hu, Yi Bin, Yang Yang, See-Kiong Ng, Heng Tao Shen
https://arxiv.org/abs/2507.02804
This https://arxiv.org/abs/2412.11934 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
This https://arxiv.org/abs/2501.18626 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
Evaluation of LLMs for mathematical problem solving
Ruonan Wang, Runxi Wang, Yunwen Shen, Chengfeng Wu, Qinglin Zhou, Rohitash Chandra
https://arxiv.org/abs/2506.00309
SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism
Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen
https://arxiv.org/abs/2507.01513
LLMREI: Automating Requirements Elicitation Interviews with LLMs
Alexander Korn, Samuel Gorsch, Andreas Vogelsang
https://arxiv.org/abs/2507.02564 https://…
Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning
Weiyang Guo, Zesheng Shi, Zhuo Li, Yequan Wang, Xuebo Liu, Wenya Wang, Fangming Liu, Min Zhang, Jing Li
https://arxiv.org/abs/2506.00782
LLMs for Legal Subsumption in German Employment Contracts
Oliver Wardas, Florian Matthes
https://arxiv.org/abs/2507.01734 https://arx…
Control at Stake: Evaluating the Security Landscape of LLM-Driven Email Agents
Jiangrong Wu, Yuhong Nan, Jianliang Wu, Zitong Yao, Zibin Zheng
https://arxiv.org/abs/2507.02699
DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains
Yongkang Xiao, Sinian Zhang, Yi Dai, Huixue Zhou, Jue Hou, Jie Ding, Rui Zhang
https://arxiv.org/abs/2506.00708
This https://arxiv.org/abs/2412.15289 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
This https://arxiv.org/abs/2501.07849 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
This https://arxiv.org/abs/2505.19165 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
This https://arxiv.org/abs/2505.18889 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
Reuse or Generate? Accelerating Code Editing via Edit-Oriented Speculative Decoding
Peiding Wang, Li Zhang, Fang Liu, Yinghao Zhu, Wang Xu, Lin Shi, Xiaoli Lian, Minxiao Li, Bo Shen, An Fu
https://arxiv.org/abs/2506.02780
This https://arxiv.org/abs/2406.13948 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
This https://arxiv.org/abs/2502.11191 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
This https://arxiv.org/abs/2504.11711 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
Chen Xiong, Pin-Yu Chen, Tsung-Yi Ho
https://arxiv.org/abs/2506.00781 ht…
ATAG: AI-Agent Application Threat Assessment with Attack Graphs
Parth Atulbhai Gandhi, Akansha Shukla, David Tayouri, Beni Ifland, Yuval Elovici, Rami Puzis, Asaf Shabtai
https://arxiv.org/abs/2506.02859
This https://arxiv.org/abs/2505.23387 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
This https://arxiv.org/abs/2506.02139 has been replaced.
link: https://scholar.google.com/scholar?q=a
This https://arxiv.org/abs/2505.08459 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
This https://arxiv.org/abs/2409.14644 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
Evaluating Language Models For Threat Detection in IoT Security Logs
Jorge J. Tejero-Fern\'andez, Alfonso S\'anchez-Maci\'an
https://arxiv.org/abs/2507.02390
This https://arxiv.org/abs/2503.20197 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csSE_…
This https://arxiv.org/abs/2505.16978 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csAI_…
This https://arxiv.org/abs/2408.16028 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…
Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab
Haonan Duan, Stephen Zhewen Lu, Caitlin Fiona Harrigan, Nishkrit Desai, Jiarui Lu, Micha{\l} Koziarski, Leonardo Cotta, Chris J. Maddison
https://arxiv.org/abs/2507.02083
Flow2Code: Evaluating Large Language Models for Flowchart-based Code Generation Capability
Mengliang He, Jiayi Zeng, Yankai Jiang, Wei Zhang, Zeming Liu, Xiaoming Shi, Aimin Zhou
https://arxiv.org/abs/2506.02073
Empirical Evaluation of Generalizable Automated Program Repair with Large Language Models
Viola Campos, Ridwan Shariffdeen, Adrian Ulges, Yannic Noller
https://arxiv.org/abs/2506.03283
Meta-Fair: AI-Assisted Fairness Testing of Large Language Models
Miguel Romero-Arjona, Jos\'e A. Parejo, Juan C. Alonso, Ana B. S\'anchez, Aitor Arrieta, Sergio Segura
https://arxiv.org/abs/2507.02533
From Theory to Practice: Real-World Use Cases on Trustworthy LLM-Driven Process Modeling, Prediction and Automation
Peter Pfeiffer, Alexander Rombach, Maxim Majlatow, Nijat Mehdiyev
https://arxiv.org/abs/2506.03801