Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems
Abstract
The proliferation of agentic AI coding assistants, including Claude Code, GitHub Copilot, Cursor, and emerging skill-based architectures, has fundamentally transformed software development workflows. These systems leverage Large Language Models (LLMs) integrated with external tools, file systems, and shell access through protocols like the Model Context Protocol (MCP). However, this expanded capability surface introduces critical security vulnerabilities. In this Systematization of Knowledge (SoK) paper, we present a comprehensive analysis of prompt injection attacks targeting agentic coding assistants. We propose a novel three-dimensional taxonomy categorizing attacks across delivery vectors, attack modalities, and propagation behaviors. Our meta-analysis synthesizes findings from 78 recent studies (2021–2026), consolidating evidence that attack success rates against state-of-the-art defenses exceed 85% when adaptive attack strategies are employed. We systematically catalog 42 distinct attack techniques spanning input manipulation, tool poisoning, protocol exploitation, multimodal injection, and crossorigin context poisoning. Through critical analysis of 18 defense mechanisms reported in prior work, we identify that most achieve less than 50% mitigation against sophisticated adaptive attacks. We contribute: (1) a unified taxonomy bridging disparate attack classifications, (2) the first systematic analysis of skill-ased architecture vulnerabilities with concrete exploit chains, and (3) a defense-in-depth framework grounded in the limitations we identify. Our findings indicate that the security community must treat prompt injection as a first-class vulnerability class requiring architectural-level mitigations rather than ad-hoc filtering approaches.
Full Text:
PDFReferences
Anthropic, “Claude Code: Agentic coding tool,” 2025.
GitHub, “GitHub Copilot documentation,” 2025.
Cursor Inc., “Cursor: The AI-first code editor,” 2025.
OpenAI, “Codex CLI,” 2025.
Anthropic, “Model Context Protocol specification,” 2025.
Anthropic, “Claude Code skills documentation,” 2025.
Anthropic, “Claude 3.7 system card,” 2025.
National Institute of Standards and Technology, “Artificial intelligence
risk management framework (AI RMF 1.0),” NIST AI 100-1, 2025.
OWASP, “LLM01:2025 Prompt injection,” 2025.
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M.
Fritz, “Not what you’ve signed up for: Compromising real-world LLMintegrated applications with indirect prompt injection,” in Proc. AISec
Workshop, 2023.
F. Perez and I. Ribeiro, “Ignore this title and HackAPrompt,” 2022.
A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson,
“Universal and transferable adversarial attacks on aligned language
models,” arXiv preprint arXiv:2307.15043, 2023.
A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: How does LLM
safety training fail?” in Proc. NeurIPS, 2023.
N. Carlini, F. Tram`er, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, A. Oprea, and C. Raffel, “Extracting training data from large language models,” in Proc. USENIX Security, 2021.
N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tram`er, and C. Zhang, “Quantifying memorization across neural language models,” in Proc. ICLR, 2023.
R. Schuster, C. Song, E. Tromer, and V. Shmatikov, “You autocomplete me: Poisoning vulnerabilities in neural code completion,” in Proc. USENIX Security, 2021.
P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries,” arXiv preprint arXiv:2310.08419, 2024.
A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y. Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box LLMs automatically,” in Proc. NeurIPS, 2024.
X. Liu, N. Xu, M. Chen, and C. Xiao, “AutoDAN: Generating stealthy jailbreak prompts on aligned large language models,” in Proc. ICLR, 2024.
X. Liu et al., “AutoDAN-Turbo: A lifelong agent for strategy selfexploration to jailbreak LLMs,” arXiv preprint arXiv:2410.05295, 2024.
A. Marzouk, “IDEsaster: Security vulnerabilities in AI-powered integrated development environments,” Technical Report, 2025.
Y. Liu, Y. Zhao, Y. Lyu, T. Zhang, H. Wang, and D. Lo, “Your AI, my shell: Demystifying prompt injection attacks on agentic AI coding editors,” arXiv preprint arXiv:2509.22040, 2025.
Pillar Security, “Rules file backdoor vulnerability,” 2025.
A. Storek, M. Gupta, N. Bhatt, A. Gupta, J. Kim, P. Srivastava, and S.
Jana, “XOXO: Stealthy cross-origin context poisoning attacks against AI coding assistants,” arXiv preprint arXiv:2503.14281, 2025.
S. Gaire, S. Gyawali, S. Mishra, S. Niroula, D. Thakur, and U. Yadav, “Systematization of knowledge: Security and safety in the Model Context Protocol ecosystem,” arXiv preprint arXiv:2512.08290, 2025.
M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, and M. Debbah, “From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows,” arXiv preprint arXiv:2506.23260, 2025.
Invariant Labs, “MCP tool poisoning attacks,” 2025.
M. Bhatt, V. S. Narajala, and I. Habler, “ETDI: Mitigating tool squatting and rug pull attacks in Model Context Protocol,” arXiv preprint arXiv:2506.01333, 2025.
S. Jamshidi, K. W. Nafi, A. M. Dakhel, N. Shahabi, F. Khomh, and N. Ezzati-Jivan, “Securing the Model Context Protocol: Defending LLMs against tool poisoning and adversarial attacks,” arXiv preprint arXiv:2512.06556, 2025.
X. Hou et al., “Model Context Protocol (MCP): Landscape, security
threats, and future research directions,” arXiv preprint arXiv:2503.23278, 2025.
Unit 42, “New prompt injection attack vectors through MCP sampling,” Palo Alto Networks, 2025.
E. Bagdasaryan, T. Hsieh, B. Nassi, and V. Shmatikov, “(Ab)using images and sounds for indirect instruction injection in multi-modal LLMs,” arXiv preprint arXiv:2307.10490, 2023.
Y. Gong, D. Ran, J. Liu, C. Wang, T. Cong, A. Wang, S. Duan, and X. Wang, “FigStep: Jailbreaking large vision-language models via typographic visual prompts,” arXiv preprint arXiv:2311.05608, 2023.
W. Zou, R. Geng, B. Wang, and J. Jia, “PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language
models,” in Proc. USENIX Security, 2025.
C. Clop and Y. Teglia, “Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models,” arXiv preprint arXiv:2410.14479, 2024.
P. Cheng, Y. Ding, T. Ju, Z. Wu, W. Du, P. Yi, Z. Zhang, and G. Liu, “TrojanRAG: Retrieval-augmented generation can be backdoor driver in large language models,” arXiv preprint arXiv:2405.13401, 2024.
S. Yan et al., “An LLM-assisted easy-to-trigger backdoor attack on code completion models: Injecting disguised vulnerabilities against strong
detection,” in Proc. USENIX Security, 2024.
M. Bhatt et al., “Purple Llama CyberSecEval: A secure coding benchmark for language models,” arXiv preprint arXiv:2312.04724, 2023.
H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,” in Proc. IEEE S&P, 2022.
J. Spracklen, R. Wijewickrama, A. H. M. N. Sakib, A. Maiti, B. Viswanath, and M. Jadliwala, “We have a package for you! A comprehensive analysis of package hallucinations by code generating LLMs,” arXiv preprint arXiv:2406.10279, 2024.
M. Yang et al., “Inserting and activating backdoor attacks in LLM
agents,” in Proc. ACL, 2024.
E. Hubinger et al., “Sleeper agents: Training deceptive LLMs that persist through safety training,” arXiv preprint arXiv:2401.05566, 2024.
Y. Yang, D. Wu, and Y. Chen, “MCPSecBench: A systematic security benchmark and playground for testing Model Context Protocols,” arXiv preprint arXiv:2508.13220, 2025.
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer,
and F. Tram`er, “AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” in Proc. NeurIPS, 2024.
M. Andriushchenko et al., “AgentHarm: A benchmark for measuring harmfulness of LLM agents,” in Proc. ICLR, 2025.
H. Zhang, J. Huang, K. Mei, Y. Yao, Z. Wang, C. Zhan, H. Wang, and Y. Zhang, “Agent Security Bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents,” in Proc. ICLR, 2025.
S. Toyer et al., “Tensor Trust: Interpretable prompt injection attacks from an online game,” in Proc. NeurIPS, 2023.
S. Schulhoff et al., “Ignore this title and HackAPrompt: Exposing systemic vulnerabilities of LLMs through a global prompt hacking competition,” in Proc. EMNLP, 2023.
T. Yuan et al., “R-Judge: Benchmarking safety risk awareness for LLM
agents,” in Findings of EMNLP, 2024.
Y. Ruan, H. Dong, A. Wang, S. Pitis, Y. Zhou, J. Ba, Y. Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of LM agents with an LM-emulated sandbox,” in Proc. ICLR, 2024.
Q. Zhan, R. Fang, R. Bindu, A. Gupta, T. Hashimoto, and D. Kang,
“InjecAgent: Benchmarking indirect prompt injections in tool-integrated
LLM agents,” in Findings of ACL, 2024.
M. Mazeika et al., “HarmBench: A standardized evaluation framework for automated red teaming and robust refusal,” arXiv preprint
arXiv:2402.04249, 2024.
P. Chao et al., “JailbreakBench: An open robustness benchmark for jailbreaking language models,” in Proc. NeurIPS, 2024.
Y. Liu, Y. Deng, R. Xu, Y. Wang, and Y. Liu, “Formalizing and benchmarking prompt injection attacks and defenses,” in Proc. USENIX
Security, 2024.
J. Yi, Y. Xie, B. Zhu, E. Kiciman, G. Sun, X. Xie, and F. Wu, “Benchmarking and defending against indirect prompt injection attacks
on large language models,” arXiv preprint arXiv:2312.14197, 2023.
E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel,
“The instruction hierarchy: Training LLMs to prioritize privileged instructions,” arXiv preprint arXiv:2404.13208, 2024.
S. M. A. Hossain, R. K. Shayoni, M. R. Ameen, A. Islam, M. F. Mridha, and J. Shin, “A multi-agent LLM defense pipeline against prompt injection attacks,” arXiv preprint arXiv:2509.14285, 2025.
T. Shi, K. Zhu, Z. Wang, Y. Jia, W. Cai, W. Liang, H. Wang, H. Alzahrani, J. Lu, K. Kawaguchi, B. Alomair, X. Zhao, W. Y. Wang, N. Gong, W. Guo, and D. Song, “PromptArmor: Simple yet effective prompt injection defenses,” arXiv preprint arXiv:2507.15219, 2025.
K. Hines et al., “Defending against indirect prompt injection attacks
with spotlighting,” Microsoft Research, 2024.
E. Debenedetti et al., “Defeating prompt injections by design,” arXiv preprint arXiv:2503.18813, 2025.
Y. Wu, F. Roesner, T. Kohno, N. Zhang, and U. Iqbal, “IsolateGPT: An execution isolation architecture for LLM-based agentic systems,” in Proc. NDSS, 2025.
S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “StruQ: Defending against prompt injection with structured queries,” in Proc. USENIX Security, 2025.
S. Chen, A. Zharmagambetov, S. Mahloujifar, K. Chaudhuri, D. Wagner, and C. Guo, “SecAlign: Defending against prompt injection with
preference optimization,” in Proc. CCS, 2025.
Meta AI, “Agents Rule of Two: A practical approach to AI agent security,” 2025.
T. Shi, J. He, Z. Wang, L. Wu, H. Li, W. Guo, and D. Song, “Progent:
Programmable privilege control for LLM agents,” arXiv preprint arXiv:2504.11703, 2025.
Z. Wang et al., “MELON: Provable defense against indirect prompt
injection attacks in AI agents,” in Proc. ICML, 2025.
H. Inan et al., “Llama Guard: LLM-based input-output safeguard for
human-AI conversations,” arXiv preprint arXiv:2312.06674, 2023.
T. Rebedea, R. Dinu, M. N. Sreedhar, C. Parisien, and J. Cohen, “NeMo Guardrails: A toolkit for controllable and safe LLM applications with
programmable rails,” in Proc. EMNLP Demo, 2023.
T. Markov et al., “A holistic approach to undesired content detection in the real world,” in Proc. AAAI, 2023.
M. Nasr, N. Carlini, C. Sitawarin, S. Schulhoff, J. Hayes, et al., “The attacker moves second: Stronger adaptive attacks bypass defenses against LLM jailbreaks and prompt injections,” arXiv preprint arXiv:2510.09023, 2025.
F. Tram`er, N. Carlini, W. Brendel, S. Madry, and A. Kurakin, “On
adaptive attacks to adversarial example defenses,” in Proc. NeurIPS, 2020.
Y. Liu, G. Deng, Y. Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y. Liu,
H. Wang, Y. Zheng, and Y. Liu, “Identifying the risks of LM agents with an LM-emulated sandbox,” in Proc. ICLR, 2024.
S. Datta, S. K. Nahin, A. Chhabra, and P. Mohapatra, “Agentic AI security: Threats, defenses, evaluation, and open challenges,” arXiv
preprint arXiv:2510.23883, 2025.
Y. Liu et al., “Prompt injection attack against LLM-integrated applications,” arXiv preprint arXiv:2306.05499, 2023.
Y. Yao et al., “A survey on large language model (LLM) security
and privacy: The good, the bad, and the ugly,” in High-Confidence Computing, 2024.
Q. Feng, S. R. Kasa, S. K. Kasa, H. Yun, C. H. Teo, and S. B. Bodapati, “Exposing privacy gaps: Membership inference attack on preference data for LLM alignment,” arXiv preprint arXiv:2407.06443, 2024.
W. Fu et al., “Membership inference attacks against fine-tuned large
language models via self-prompt calibration,” in Proc. NeurIPS, 2024.
M. Ohm, H. Plate, A. Sykosch, and M. Meier, “Backstabber’s knife collection: A review of open source software supply chain attacks,” in Proc. DIMVA, 2020.
P. Ladisa et al., “SoK: Taxonomy of attacks on open-source software
supply chains,” in Proc. IEEE S&P, 2023.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность ИТ конгресс СНЭ
ISSN: 2307-8162