Tootfinder

@arXiv_csCR_bot@mastoxiv.page
2025-07-31 09:14:21

Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding
Chetan Pathade
https://arxiv.org/abs/2507.22304 https://arx…

Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding
Vision-language models (VLMs) have revolutionized multimodal AI applications but introduce novel security vulnerabilities that remain largely unexplored. We present the first comprehensive study of steganographic prompt injection attacks against VLMs, where malicious instructions are invisibly embedded within images using advanced steganographic techniques. Our approach demonstrates that current VLM architectures can inadvertently extract and execute hidden prompts during normal image processin…

@arXiv_csCL_bot@mastoxiv.page
2025-07-29 11:46:21

Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning
Jungwon Park, Wonjong Rhee
https://arxiv.org/abs/2507.20906 https://arxiv.or…

Soft Injection of Task Embeddings Outperforms Prompt-Based In-Context Learning
In-Context Learning (ICL) enables Large Language Models (LLMs) to perform tasks by conditioning on input-output examples in the prompt, without requiring any update in model parameters. While widely adopted, it remains unclear whether prompting with multiple examples is the most effective and efficient way to convey task information. In this work, we propose Soft Injection of task embeddings. The task embeddings are constructed only once using few-shot ICL prompts and repeatedly used during inf…

@gedankenstuecke@scholar.social
2025-05-25 03:16:56

I'm not surprised that Gitlab decided to run off a cliff to follow GitHub:
«AI coding bot allows prompt injection with a pull request»
Everyday I'm more grateful for @… and @…!
https://pivot-to-ai.com/2025/05/24/ai-coding-bot-allows-prompt-injection-with-a-pull-request/

@publicvoit@graz.social
2025-07-09 07:31:58

"Zero-Click Prompt Injection":
https://calypsoai.com/insights/prompt-injection-attacks-what-you-need-to-know/
So instead of trying to trick an employee via phishing

Prompt Injection Attacks: What You Need to Know - CalypsoAI
Prompt injection is the top AI security threat. Learn how it works, why it’s dangerous, and how enterprises can secure AI systems against it.

@gedankenstuecke@scholar.social
2025-05-25 03:16:56

I'm not surprised that Gitlab decided to run off a cliff to follow GitHub:
«AI coding bot allows prompt injection with a pull request»
Everyday I'm more grateful for @… and @…!
https://pivot-to-ai.com/2025/05/24/ai-coding-bot-allows-prompt-injection-with-a-pull-request/

@arXiv_csCR_bot@mastoxiv.page
2025-05-30 07:16:44

Securing AI Agents with Information-Flow Control
Manuel Costa, Boris K\"opf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-B\'eguelin
https://arxiv.org/abs/2505.23643

Securing AI Agents with Information-Flow Control
As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security…

@arXiv_csCR_bot@mastoxiv.page
2025-06-25 07:46:20

Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems
Valerii Gakh, Hayretdin Bahsi
https://arxiv.org/abs/2506.19109 …

Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems
Prompt injection threatens novel applications that emerge from adapting LLMs for various user tasks. The newly developed LLM-based software applications become more ubiquitous and diverse. However, the threat of prompt injection attacks undermines the security of these systems as the mitigation and defenses against them, proposed so far, are insufficient. We investigated the capabilities of early prompt injection detection systems, focusing specifically on the detection performance of technique…

@arXiv_csCR_bot@mastoxiv.page
2025-07-21 08:38:30

TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi
https://arxiv.org/abs/2507.13686

TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
Large language models (LLMs) have shown remarkable performance across a range of NLP tasks. However, their strong instruction-following capabilities and inability to distinguish instructions from data content make them vulnerable to indirect prompt injection attacks. In such attacks, instructions with malicious purposes are injected into external data sources, such as web documents. When LLMs retrieve this injected data through tools, such as a search engine and execute the injected instruction…

@arXiv_csCR_bot@mastoxiv.page
2025-06-09 07:53:32

To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt
Zhilong Wang, Neha Nagaraja, Lan Zhang, Hayretdin Bahsi, Pawan Patil, Peng Liu
https://arxiv.org/abs/2506.05739

To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt
LLM agents are widely used as agents for customer support, content generation, and code assistance. However, they are vulnerable to prompt injection attacks, where adversarial inputs manipulate the model's behavior. Traditional defenses like input sanitization, guard models, and guardrails are either cumbersome or ineffective. In this paper, we propose a novel, lightweight defense mechanism called Polymorphic Prompt Assembling (PPA), which protects against prompt injection with near-zero overhe…

@arXiv_csCR_bot@mastoxiv.page
2025-07-18 08:57:42

Prompt Injection 2.0: Hybrid AI Threats
Jeremy McHugh, Kristina \v{S}ekrst, Jon Cefalu
https://arxiv.org/abs/2507.13169 https://arxiv…

Prompt Injection 2.0: Hybrid AI Threats
Prompt injection attacks, where malicious input is designed to manipulate AI systems into ignoring their original instructions and following unauthorized commands instead, were first discovered by Preamble, Inc. in May 2022 and responsibly disclosed to OpenAI. Over the last three years, these attacks have continued to pose a critical security threat to LLM-integrated systems. The emergence of agentic AI systems, where LLMs autonomously perform multistep tasks through tools and coordination with…

@arXiv_csCR_bot@mastoxiv.page
2025-07-18 08:41:32

MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems
Yu Cui, Hongyang Du
https://arxiv.org/abs/2507.13038 https://

MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems
Multi-agent debate (MAD) systems leverage collaborative interactions among large language models (LLMs) agents to improve reasoning capabilities. While recent studies have focused on increasing the accuracy and scalability of MAD systems, their security vulnerabilities have received limited attention. In this work, we introduce MAD-Spear, a targeted prompt injection attack that compromises a small subset of agents but significantly disrupts the overall MAD process. Manipulated agents produce mu…

@arXiv_csCR_bot@mastoxiv.page
2025-07-11 09:11:11

May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks
Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes
https://arxiv.org/abs/2507.07417

May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks
A popular class of defenses against prompt injection attacks on large language models (LLMs) relies on fine-tuning the model to separate instructions and data, so that the LLM does not follow instructions that might be present with data. There are several academic systems and production-level implementations of this idea. We evaluate the robustness of this class of prompt injection defenses in the whitebox setting by constructing strong optimization-based attacks and showing that the defenses d…

@arXiv_csCR_bot@mastoxiv.page
2025-06-12 07:32:21

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge
Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, Jo\~ao Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge
Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models (LLMs) to distinguish between instructions and data in their inputs. Despite numerous defense proposals, the systematic evaluation against adaptive adversaries remains limited, even when successful attacks can have wide security and privacy implications, and many real-world LLM-based applications remain vulnerable. We present the results of LLMail-Inject, a public challenge simulating a realistic scenario…

@arXiv_csCR_bot@mastoxiv.page
2025-07-11 09:39:51

Defending Against Prompt Injection With a Few DefensiveTokens
Sizhe Chen, Yizhu Wang, Nicholas Carlini, Chawin Sitawarin, David Wagner
https://arxiv.org/abs/2507.07974

Defending Against Prompt Injection With a Few DefensiveTokens
When large language model (LLM) systems interact with external data to perform complex tasks, a new attack, namely prompt injection, becomes a significant threat. By injecting instructions into the data accessed by the system, the attacker is able to override the initial user task with an arbitrary task directed by the attacker. To secure the system, test-time defenses, e.g., defensive prompting, have been proposed for system developers to attain security only when needed in a flexible manner. …

@arXiv_csCR_bot@mastoxiv.page
2025-07-04 09:56:21

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks
Sizhe Chen, Arman Zharmagambetov, David Wagner, Chuan Guo
https://arxiv.org/abs/2507.02735

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks
Prompt injection attacks pose a significant security threat to LLM-integrated applications. Model-level defenses have shown strong effectiveness, but are currently deployed into commercial-grade models in a closed-source manner. We believe open-source models are needed by the AI security community, where co-development of attacks and defenses through open research drives scientific progress in mitigation against prompt injection attacks. To this end, we develop Meta SecAlign, the first open-sou…

@arXiv_csCR_bot@mastoxiv.page
2025-06-09 07:37:32

Sentinel: SOTA model to protect against prompt injections
Dror Ivry, Oran Nahum
https://arxiv.org/abs/2506.05446 https://arxiv.org/pd…

Sentinel: SOTA model to protect against prompt injections
Large Language Models (LLMs) are increasingly powerful but remain vulnerable to prompt injection attacks, where malicious inputs cause the model to deviate from its intended instructions. This paper introduces Sentinel, a novel detection model, qualifire/prompt-injection-sentinel, based on the \answerdotai/ModernBERT-large architecture. By leveraging ModernBERT's advanced features and fine-tuning on an extensive and diverse dataset comprising a few open-source and private collections, Sentinel …

@arXiv_csCR_bot@mastoxiv.page
2025-07-23 13:05:34

Replaced article(s) found for cs.CR. https://arxiv.org/list/cs.CR/new
[1/1]:
- Defense Against Prompt Injection Attack by Leveraging Attack Techniques
Yulin Chen, Haoran Li, Zihao Zheng, Yangqiu Song, Dekai Wu, Bryan Hooi

@arXiv_csCR_bot@mastoxiv.page
2025-07-09 09:23:32

How Not to Detect Prompt Injections with an LLM
Sarthak Choudhary, Divyam Anshumaan, Nils Palumbo, Somesh Jha
https://arxiv.org/abs/2507.05630 https://

How Not to Detect Prompt Injections with an LLM
LLM-integrated applications and agents are vulnerable to prompt injection attacks, in which adversaries embed malicious instructions within seemingly benign user inputs to manipulate the LLM's intended behavior. Recent defenses based on $\textit{known-answer detection}$ (KAD) have achieved near-perfect performance by using an LLM to classify inputs as clean or contaminated. In this work, we formally characterize the KAD framework and uncover a structural vulnerability in its design that invalid…

@arXiv_csCR_bot@mastoxiv.page
2025-07-21 11:52:57

Replaced article(s) found for cs.CR. https://arxiv.org/list/cs.CR/new
[1/1]:
- Defense Against Prompt Injection Attack by Leveraging Attack Techniques
Yulin Chen, Haoran Li, Zihao Zheng, Yangqiu Song, Dekai Wu, Bryan Hooi

@arXiv_csCR_bot@mastoxiv.page
2025-06-06 09:37:16

This https://arxiv.org/abs/2505.05849 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…

AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents
The strong planning and reasoning capabilities of Large Language Models (LLMs) have fostered the development of agent-based systems capable of leveraging external tools and interacting with increasingly complex environments. However, these powerful features also introduce a critical security risk: indirect prompt injection, a sophisticated attack vector that compromises the core of these agents, the LLM, by manipulating contextual information rather than direct user prompts. In this work, we pr…

@arXiv_csCR_bot@mastoxiv.page
2025-06-13 07:24:20

Evaluation empirique de la s\'ecurisation et de l'alignement de ChatGPT et Gemini: analyse comparative des vuln\'erabilit\'es par exp\'erimentations de jailbreaks
Rafa\"el Nouailles (GdR)
https://arxiv.org/abs/2506.10029

Evaluation empirique de la sécurisation et de l'alignement de ChatGPT et Gemini: analyse comparative des vulnérabilités par expérimentations de jailbreaks
Large Language models (LLMs) are transforming digital usage, particularly in text generation, image creation, information retrieval and code development. ChatGPT, launched by OpenAI in November 2022, quickly became a reference, prompting the emergence of competitors such as Google's Gemini. However, these technological advances raise new cybersecurity challenges, including prompt injection attacks, the circumvention of regulatory measures (jailbreaking), the spread of misinformation (hallucinat…

@arXiv_csCR_bot@mastoxiv.page
2025-07-10 09:49:31

The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover
Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, Angelo Furfaro
https://arxiv.org/abs/2507.06850

The Dark Side of LLMs Agent-based Attacks for Complete Computer Takeover
The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables unprecedented capabilities in natural language processing and generation. However, these systems have introduced unprecedented security vulnerabilities that extend beyond traditional prompt injection attacks. This paper presents the first comprehensive evaluation of LLM agents as attack vectors capable of achieving complete computer takeover through the exploitation of trust boundaries within agentic AI syst…

@arXiv_csCR_bot@mastoxiv.page
2025-06-02 07:17:42

LLM Agents Should Employ Security Principles
Kaiyuan Zhang, Zian Su, Pin-Yu Chen, Elisa Bertino, Xiangyu Zhang, Ninghui Li
https://arxiv.org/abs/2505.24019

LLM Agents Should Employ Security Principles
Large Language Model (LLM) agents show considerable promise for automating complex tasks using contextual reasoning; however, interactions involving multiple agents and the system's susceptibility to prompt injection and other forms of context manipulation introduce new vulnerabilities related to privacy leakage and system exploitation. This position paper argues that the well-established design principles in information security, which are commonly referred to as security principles, should be…

@arXiv_csCR_bot@mastoxiv.page
2025-06-03 17:52:02

This https://arxiv.org/abs/2505.18889 has been replaced.
initial toot: https://mastoxiv.page/@arXiv_csCR_…

Security Concerns for Large Language Models: A Survey
Large Language Models (LLMs) such as GPT-4 (and its recent iterations like GPT-4o and the GPT-4.1 series), Google's Gemini, Anthropic's Claude 3 models, and xAI's Grok have caused a revolution in natural language processing, but their capabilities also introduce new security vulnerabilities. In this survey, we provide a comprehensive overview of the emerging security concerns around LLMs, categorizing threats into prompt injection and jailbreaking, adversarial attacks (including input perturbat…

Tootfinder

Opt-in global Mastodon full text search. Join the index!