Dark Patterns Meet GUI Agents: LLM Agent Susceptibility to Manipulative Interfaces and the Role of Human Oversight
Jingyu Tang, Chaoran Chen, Jiawen Li, Zhiping Zhang, Bingcan Guo, Ibrahim Khalilov, Simret Araya Gebreegziabher, Bingsheng Yao, Dakuo Wang, Yanfang Ye, Tianshi Li, Ziang Xiao, Yaxing Yao, Toby Jia-Jun Li
https://arxiv.org/abs/…
What if Virtual Agents Had Scents? Users' Judgments of Virtual Agent Personality and Appeals in Encounters
Dongyun Han, Siyeon Bak, So-Hui Kim, Kangsoo Kim, Sun-Jeong Kim, Isaac Cho
https://arxiv.org/abs/2509.11342
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
Kartik Narayan, Yang Xu, Tian Cao, Kavya Nerella, Vishal M. Patel, Navid Shiee, Peter Grasch, Chao Jia, Yinfei Yang, Zhe Gan
https://arxiv.org/abs/2510.12801
Multimodal Policy Internalization for Conversational Agents
Zhenhailong Wang, Jiateng Liu, Amin Fazel, Ritesh Sarkhel, Xing Fan, Xiang Li, Chenlei Guo, Heng Ji, Ruhi Sarikaya
https://arxiv.org/abs/2510.09474
SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
Zonghao Ying, Yangguang Shao, Jianle Gan, Gan Xu, Junjie Shen, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, Xianglong Liu
https://arxiv.org/abs/2510.10073
It's funny how "AI" tools are simulteanously marketed as "agents" that can run fully in the background and do stuff but whenever they do something bad it's the user at fault for not supervising the software that doesn't work.
Even when it’s directly used and the user has the chance to review everything—it’s extremely dangerous, especially at tasks it is doing fine like 95% of the time and/or when the bad things are only subtly wrong.
Imagine other tools being like this, like a steering wheel that turns the car 95 out of a 100 times. 2% of the time it steers into the other direction. 3% of the time it steers 5x as much as normally.
COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Tian Qin, Felix Bai, Ting-Yao Hu, Raviteja Vemulapalli, Hema Swetha Koppula, Zhiyang Xu, Bowen Jin, Mert Cemri, Jiarui Lu, Zirui Wang, Meng Cao
https://arxiv.org/abs/2510.07043
AgentDR Dynamic Recommendation with Implicit Item-Item Relations via LLM-based Agents
Mingdai Yang, Nurendra Choudhary, Jiangshu Du, Edward W. Huang, Philip S. Yu, Karthik Subbian, Danai Kourta
https://arxiv.org/abs/2510.05598
NegotiationGym: Self-Optimizing Agents in a Multi-Agent Social Simulation Environment
Shashank Mangla, Chris Hokamp, Jack Boylan, Demian Gholipour Ghalandari, Yuuv Jauhari, Lauren Cassidy, Oisin Duffy
https://arxiv.org/abs/2510.04368
VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation
Lesly Miculicich, Mihir Parmar, Hamid Palangi, Krishnamurthy Dj Dvijotham, Mirko Montanari, Tomas Pfister, Long T. Le
https://arxiv.org/abs/2510.05156
Selling Privacy in Blockchain Transactions
Georgios Chionas, Olga Gorelkina, Piotr Krysta, Rida Laraki
https://arxiv.org/abs/2512.08096 https://arxiv.org/pdf/2512.08096 https://arxiv.org/html/2512.08096
arXiv:2512.08096v1 Announce Type: new
Abstract: We study methods to enhance privacy in blockchain transactions from an economic angle. We consider mechanisms for privacy-aware users whose utility depends not only on the outcome of the mechanism but also negatively on the exposure of their economic preferences. Specifically, we study two auction-theoretic settings with privacy-aware users. First, we analyze an order flow auction, where a user auctions off to specialized agents, called searchers, the right to execute her transaction while maintaining a degree of privacy. We examine how the degree of privacy affects the revenue of the auction and, broadly, the net utility of the privacy-aware user. In this new setting, we describe the optimal auction, which is a sealed-bid auction. Subsequently, we analyze a variant of a Dutch auction in which the user gradually decreases the price and the degree of privacy until the transaction is sold. We compare the revenue of this auction to that of the optimal one as a function of the number of communication rounds. Then, we introduce a two-sided market - a privacy marketplace - with multiple users selling their transactions under their privacy preferences to multiple searchers. We propose a posted-price mechanism for the two-sided market that guarantees constant approximation of the optimal social welfare while maintaining incentive compatibility (from both sides of the market) and budget balance. This work builds on the emerging line of research that attempts to improve the performance of economic mechanisms by appending cryptographic primitives to them.
toXiv_bot_toot
Exposing LLM User Privacy via Traffic Fingerprint Analysis: A Study of Privacy Risks in LLM Agent Interactions
Yixiang Zhang, Xinhao Deng, Zhongyi Gu, Yihao Chen, Ke Xu, Qi Li, Jianping Wu
https://arxiv.org/abs/2510.07176
FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents
Imene Kerboua, Sahar Omidi Shayegan, Megh Thakkar, Xing Han L\`u, L\'eo Boisvert, Massimo Caccia, J\'er\'emy Espinas, Alexandre Aussem, V\'eronique Eglin, Alexandre Lacoste
https://arxiv.org/abs/2510.03204…
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
Zhen Yang, Zi-Yi Dou, Di Feng, Forrest Huang, Anh Nguyen, Keen You, Omar Attia, Yuhao Yang, Michael Feng, Haotian Zhang, Ram Ramrakhya, Chao Jia, Jeffrey Nichols, Alexander Toshev, Yinfei Yang, Zhe Gan
https://arxiv.org/abs/2509.26539…
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
Wei He, Yueqing Sun, Hongyan Hao, Xueyuan Hao, Zhikang Xia, Qi Gu, Chengcheng Han, Dengchang Zhao, Hui Su, Kefeng Zhang, Man Gao, Xi Su, Xiaodong Cai, Xunliang Cai, Yu Yang, Yunke Zhao
https://arxiv.org/abs/2509.26490
Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents
Wonjoong Kim, Sangwu Park, Yeonjun In, Sein Kim, Dongha Lee, Chanyoung Park
https://arxiv.org/abs/2510.02837
ToolTweak: An Attack on Tool Selection in LLM-based Agents
Jonathan Sneh, Ruomei Yan, Jialin Yu, Philip Torr, Yarin Gal, Sunando Sengupta, Eric Sommerlade, Alasdair Paren, Adel Bibi
https://arxiv.org/abs/2510.02554
Say One Thing, Do Another? Diagnosing Reasoning-Execution Gaps in VLM-Powered Mobile-Use Agents
Lingzhong Dong, Ziqi Zhou, Shuaibo Yang, Haiyue Sheng, Pengzhou Cheng, Zongru Wu, Zheng Wu, Gongshen Liu, Zhuosheng Zhang
https://arxiv.org/abs/2510.02204
RISK: A Framework for GUI Agents in E-commerce Risk Management
Renqi Chen, Zeyin Tao, Jianming Guo, Jingzhe Zhu, Yiheng Peng, Qingqing Sun, Tianyi Zhang, Shuai Chen
https://arxiv.org/abs/2509.21982
D-Artemis: A Deliberative Cognitive Framework for Mobile GUI Multi-Agents
Hongze Mi, Yibo Feng, Wenjie Lu, Yuqi Wang, Jinyuan Li, Song Cao, He Cui, Tengfei Tian, Xuelin Zhang, Haotian Luo, Di Sun, Naiqiang Tan, Gang Pan
https://arxiv.org/abs/2509.21799
Towards Shift-Up: A Framework and a Prestudy on High-Value Activities in GenAI Native Software Development
Vlad Stirbu, Mateen Ahmed Abbasi, Teerath Das, Jesse Haimi, Niko Iljin, Pyry Kotilainen, Petrus Lipsanen, Niko M\"akitalo, Maiju Sipil\"a, Venla Veijalainen, Tommi Mikkonen
https://arxiv.org/abs/2509.24485
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning
Cheng Qian, Zuxin Liu, Akshara Prabhakar, Jielin Qiu, Zhiwei Liu, Haolin Chen, Shirley Kokane, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang
https://arxiv.org/abs/2509.19736
When Should Users Check? A Decision-Theoretic Model of Confirmation Frequency in Multi-Step AI Agent Tasks
Jieyu Zhou, Aryan Roy, Sneh Gupta, Daniel Weitekamp, Christopher J. MacLellan
https://arxiv.org/abs/2510.05307
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning
Cheng Qian, Zuxin Liu, Akshara Prabhakar, Jielin Qiu, Zhiwei Liu, Haolin Chen, Shirley Kokane, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang
https://arxiv.org/abs/2509.19736
Staircase Streaming for Low-Latency Multi-Agent Inference
Junlin Wang (Zach), Jue Wang (Zach), Zhen (Zach), Xu, Ben Athiwaratkun, Bhuwan Dhingra, Ce Zhang, James Zou
https://arxiv.org/abs/2510.05059 …
WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning
Zimu Lu, Houxing Ren, Yunqiao Yang, Ke Wang, Zhuofan Zong, Junting Pan, Mingjie Zhan, Hongsheng Li
https://arxiv.org/abs/2509.22644
SCUBA: Salesforce Computer Use Benchmark
Yutong Dai, Krithika Ramakrishnan, Jing Gu, Matthew Fernandez, Yanqi Luo, Viraj Prabhu, Zhenyu Hu, Silvio Savarese, Caiming Xiong, Zeyuan Chen, Ran Xu
https://arxiv.org/abs/2509.26506

SCUBA: Salesforce Computer Use Benchmark
We introduce SCUBA, a benchmark designed to evaluate computer-use agents on customer relationship management (CRM) workflows within the Salesforce platform. SCUBA contains 300 task instances derived from real user interviews, spanning three primary personas, platform administrators, sales representatives, and service agents. The tasks test a range of enterprise-critical abilities, including Enterprise Software UI navigation, data manipulation, workflow automation, information retrieval, and tro…
See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles
Zongru Wu, Rui Mao, Zhiyuan Tian, Pengzhou Cheng, Tianjie Ju, Zheng Wu, Lingzhong Dong, Haiyue Sheng, Zhuosheng Zhang, Gongshen Liu
https://arxiv.org/abs/2509.13615
A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks
S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, Jungpil Shin
https://arxiv.org/abs/2509.14285
DeepTravel: An End-to-End Agentic Reinforcement Learning Framework for Autonomous Travel Planning Agents
Yansong Ning, Rui Liu, Jun Wang, Kai Chen, Wei Li, Jun Fang, Kan Zheng, Naiqiang Tan, Hao Liu
https://arxiv.org/abs/2509.21842
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
Thierry Blankenstein, Jialin Yu, Zixuan Li, Vassilis Plachouras, Sunando Sengupta, Philip Torr, Yarin Gal, Alasdair Paren, Adel Bibi
https://arxiv.org/abs/2510.00307
ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools
Quy Minh Le, Minh Sao Khue Luu, Khanh-Tung Tran, Duc-Hai Nguyen, Hoang-Quoc-Viet Pham, Quan Le, Hoang Thanh Lam, Hoang D. Nguyen
https://arxiv.org/abs/2510.00023
Position: Human-Robot Interaction in Embodied Intelligence Demands a Shift From Static Privacy Controls to Dynamic Learning
Shuning Zhang, Hong Jia, Simin Li, Ting Dang, Yongquan `Owen' Hu, Xin Yi, Hewu Li
https://arxiv.org/abs/2509.19041
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Jingdi Lei, Varun Gumma, Rishabh Bhardwaj, Seok Min Lim, Chuan Li, Amir Zadeh, Soujanya Poria
https://arxiv.org/abs/2509.26495