Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems

Narek Maloyan, Dmitry Namiot

Abstract


The proliferation of agentic AI coding assistants, including Claude Code, GitHub  Copilot, Cursor, and emerging skill-based architectures, has fundamentally transformed software development workflows. These systems leverage Large Language Models (LLMs) integrated with external tools, file systems, and shell access through protocols like the Model Context Protocol (MCP). However, this expanded capability surface introduces critical security vulnerabilities. In this Systematization of Knowledge (SoK) paper, we present a comprehensive  analysis of prompt injection attacks targeting agentic coding assistants. We propose a novel three-dimensional taxonomy categorizing attacks across delivery vectors, attack modalities, and propagation behaviors. Our meta-analysis synthesizes findings from 78 recent studies (2021–2026), consolidating evidence that attack success rates against state-of-the-art defenses exceed 85% when adaptive attack strategies are employed. We systematically catalog  42 distinct attack techniques spanning input manipulation, tool poisoning,  protocol exploitation, multimodal injection, and crossorigin context poisoning.  Through critical analysis of 18 defense mechanisms reported in prior work, we  identify that most achieve less than 50% mitigation against sophisticated  adaptive attacks. We contribute: (1) a unified taxonomy bridging disparate  attack classifications, (2) the first  systematic analysis of skill-ased architecture  vulnerabilities with concrete  exploit chains, and (3) a defense-in-depth  framework grounded in the limitations we identify. Our findings indicate that the security community must treat prompt injection as a first-class vulnerability  class requiring architectural-level mitigations rather than ad-hoc filtering approaches.


Full Text:

PDF

References


Anthropic, “Claude Code: Agentic coding tool,” 2025.

GitHub, “GitHub Copilot documentation,” 2025.

Cursor Inc., “Cursor: The AI-first code editor,” 2025.

OpenAI, “Codex CLI,” 2025.

Anthropic, “Model Context Protocol specification,” 2025.

Anthropic, “Claude Code skills documentation,” 2025.

Anthropic, “Claude 3.7 system card,” 2025.

National Institute of Standards and Technology, “Artificial intelligence

risk management framework (AI RMF 1.0),” NIST AI 100-1, 2025.

OWASP, “LLM01:2025 Prompt injection,” 2025.

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M.

Fritz, “Not what you’ve signed up for: Compromising real-world LLMintegrated applications with indirect prompt injection,” in Proc. AISec

Workshop, 2023.

F. Perez and I. Ribeiro, “Ignore this title and HackAPrompt,” 2022.

A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson,

“Universal and transferable adversarial attacks on aligned language

models,” arXiv preprint arXiv:2307.15043, 2023.

A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: How does LLM

safety training fail?” in Proc. NeurIPS, 2023.

N. Carlini, F. Tram`er, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, A. Oprea, and C. Raffel, “Extracting training data from large language models,” in Proc. USENIX Security, 2021.

N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tram`er, and C. Zhang, “Quantifying memorization across neural language models,” in Proc. ICLR, 2023.

R. Schuster, C. Song, E. Tromer, and V. Shmatikov, “You autocomplete me: Poisoning vulnerabilities in neural code completion,” in Proc. USENIX Security, 2021.

P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, and E. Wong, “Jailbreaking black box large language models in twenty queries,” arXiv preprint arXiv:2310.08419, 2024.

A. Mehrotra, M. Zampetakis, P. Kassianik, B. Nelson, H. Anderson, Y. Singer, and A. Karbasi, “Tree of attacks: Jailbreaking black-box LLMs automatically,” in Proc. NeurIPS, 2024.

X. Liu, N. Xu, M. Chen, and C. Xiao, “AutoDAN: Generating stealthy jailbreak prompts on aligned large language models,” in Proc. ICLR, 2024.

X. Liu et al., “AutoDAN-Turbo: A lifelong agent for strategy selfexploration to jailbreak LLMs,” arXiv preprint arXiv:2410.05295, 2024.

A. Marzouk, “IDEsaster: Security vulnerabilities in AI-powered integrated development environments,” Technical Report, 2025.

Y. Liu, Y. Zhao, Y. Lyu, T. Zhang, H. Wang, and D. Lo, “Your AI, my shell: Demystifying prompt injection attacks on agentic AI coding editors,” arXiv preprint arXiv:2509.22040, 2025.

Pillar Security, “Rules file backdoor vulnerability,” 2025.

A. Storek, M. Gupta, N. Bhatt, A. Gupta, J. Kim, P. Srivastava, and S.

Jana, “XOXO: Stealthy cross-origin context poisoning attacks against AI coding assistants,” arXiv preprint arXiv:2503.14281, 2025.

S. Gaire, S. Gyawali, S. Mishra, S. Niroula, D. Thakur, and U. Yadav, “Systematization of knowledge: Security and safety in the Model Context Protocol ecosystem,” arXiv preprint arXiv:2512.08290, 2025.

M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, and M. Debbah, “From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows,” arXiv preprint arXiv:2506.23260, 2025.

Invariant Labs, “MCP tool poisoning attacks,” 2025.

M. Bhatt, V. S. Narajala, and I. Habler, “ETDI: Mitigating tool squatting and rug pull attacks in Model Context Protocol,” arXiv preprint arXiv:2506.01333, 2025.

S. Jamshidi, K. W. Nafi, A. M. Dakhel, N. Shahabi, F. Khomh, and N. Ezzati-Jivan, “Securing the Model Context Protocol: Defending LLMs against tool poisoning and adversarial attacks,” arXiv preprint arXiv:2512.06556, 2025.

X. Hou et al., “Model Context Protocol (MCP): Landscape, security

threats, and future research directions,” arXiv preprint arXiv:2503.23278, 2025.

Unit 42, “New prompt injection attack vectors through MCP sampling,” Palo Alto Networks, 2025.

E. Bagdasaryan, T. Hsieh, B. Nassi, and V. Shmatikov, “(Ab)using images and sounds for indirect instruction injection in multi-modal LLMs,” arXiv preprint arXiv:2307.10490, 2023.

Y. Gong, D. Ran, J. Liu, C. Wang, T. Cong, A. Wang, S. Duan, and X. Wang, “FigStep: Jailbreaking large vision-language models via typographic visual prompts,” arXiv preprint arXiv:2311.05608, 2023.

W. Zou, R. Geng, B. Wang, and J. Jia, “PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language

models,” in Proc. USENIX Security, 2025.

C. Clop and Y. Teglia, “Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models,” arXiv preprint arXiv:2410.14479, 2024.

P. Cheng, Y. Ding, T. Ju, Z. Wu, W. Du, P. Yi, Z. Zhang, and G. Liu, “TrojanRAG: Retrieval-augmented generation can be backdoor driver in large language models,” arXiv preprint arXiv:2405.13401, 2024.

S. Yan et al., “An LLM-assisted easy-to-trigger backdoor attack on code completion models: Injecting disguised vulnerabilities against strong

detection,” in Proc. USENIX Security, 2024.

M. Bhatt et al., “Purple Llama CyberSecEval: A secure coding benchmark for language models,” arXiv preprint arXiv:2312.04724, 2023.

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,” in Proc. IEEE S&P, 2022.

J. Spracklen, R. Wijewickrama, A. H. M. N. Sakib, A. Maiti, B. Viswanath, and M. Jadliwala, “We have a package for you! A comprehensive analysis of package hallucinations by code generating LLMs,” arXiv preprint arXiv:2406.10279, 2024.

M. Yang et al., “Inserting and activating backdoor attacks in LLM

agents,” in Proc. ACL, 2024.

E. Hubinger et al., “Sleeper agents: Training deceptive LLMs that persist through safety training,” arXiv preprint arXiv:2401.05566, 2024.

Y. Yang, D. Wu, and Y. Chen, “MCPSecBench: A systematic security benchmark and playground for testing Model Context Protocols,” arXiv preprint arXiv:2508.13220, 2025.

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer,

and F. Tram`er, “AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” in Proc. NeurIPS, 2024.

M. Andriushchenko et al., “AgentHarm: A benchmark for measuring harmfulness of LLM agents,” in Proc. ICLR, 2025.

H. Zhang, J. Huang, K. Mei, Y. Yao, Z. Wang, C. Zhan, H. Wang, and Y. Zhang, “Agent Security Bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents,” in Proc. ICLR, 2025.

S. Toyer et al., “Tensor Trust: Interpretable prompt injection attacks from an online game,” in Proc. NeurIPS, 2023.

S. Schulhoff et al., “Ignore this title and HackAPrompt: Exposing systemic vulnerabilities of LLMs through a global prompt hacking competition,” in Proc. EMNLP, 2023.

T. Yuan et al., “R-Judge: Benchmarking safety risk awareness for LLM

agents,” in Findings of EMNLP, 2024.

Y. Ruan, H. Dong, A. Wang, S. Pitis, Y. Zhou, J. Ba, Y. Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of LM agents with an LM-emulated sandbox,” in Proc. ICLR, 2024.

Q. Zhan, R. Fang, R. Bindu, A. Gupta, T. Hashimoto, and D. Kang,

“InjecAgent: Benchmarking indirect prompt injections in tool-integrated

LLM agents,” in Findings of ACL, 2024.

M. Mazeika et al., “HarmBench: A standardized evaluation framework for automated red teaming and robust refusal,” arXiv preprint

arXiv:2402.04249, 2024.

P. Chao et al., “JailbreakBench: An open robustness benchmark for jailbreaking language models,” in Proc. NeurIPS, 2024.

Y. Liu, Y. Deng, R. Xu, Y. Wang, and Y. Liu, “Formalizing and benchmarking prompt injection attacks and defenses,” in Proc. USENIX

Security, 2024.

J. Yi, Y. Xie, B. Zhu, E. Kiciman, G. Sun, X. Xie, and F. Wu, “Benchmarking and defending against indirect prompt injection attacks

on large language models,” arXiv preprint arXiv:2312.14197, 2023.

E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel,

“The instruction hierarchy: Training LLMs to prioritize privileged instructions,” arXiv preprint arXiv:2404.13208, 2024.

S. M. A. Hossain, R. K. Shayoni, M. R. Ameen, A. Islam, M. F. Mridha, and J. Shin, “A multi-agent LLM defense pipeline against prompt injection attacks,” arXiv preprint arXiv:2509.14285, 2025.

T. Shi, K. Zhu, Z. Wang, Y. Jia, W. Cai, W. Liang, H. Wang, H. Alzahrani, J. Lu, K. Kawaguchi, B. Alomair, X. Zhao, W. Y. Wang, N. Gong, W. Guo, and D. Song, “PromptArmor: Simple yet effective prompt injection defenses,” arXiv preprint arXiv:2507.15219, 2025.

K. Hines et al., “Defending against indirect prompt injection attacks

with spotlighting,” Microsoft Research, 2024.

E. Debenedetti et al., “Defeating prompt injections by design,” arXiv preprint arXiv:2503.18813, 2025.

Y. Wu, F. Roesner, T. Kohno, N. Zhang, and U. Iqbal, “IsolateGPT: An execution isolation architecture for LLM-based agentic systems,” in Proc. NDSS, 2025.

S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “StruQ: Defending against prompt injection with structured queries,” in Proc. USENIX Security, 2025.

S. Chen, A. Zharmagambetov, S. Mahloujifar, K. Chaudhuri, D. Wagner, and C. Guo, “SecAlign: Defending against prompt injection with

preference optimization,” in Proc. CCS, 2025.

Meta AI, “Agents Rule of Two: A practical approach to AI agent security,” 2025.

T. Shi, J. He, Z. Wang, L. Wu, H. Li, W. Guo, and D. Song, “Progent:

Programmable privilege control for LLM agents,” arXiv preprint arXiv:2504.11703, 2025.

Z. Wang et al., “MELON: Provable defense against indirect prompt

injection attacks in AI agents,” in Proc. ICML, 2025.

H. Inan et al., “Llama Guard: LLM-based input-output safeguard for

human-AI conversations,” arXiv preprint arXiv:2312.06674, 2023.

T. Rebedea, R. Dinu, M. N. Sreedhar, C. Parisien, and J. Cohen, “NeMo Guardrails: A toolkit for controllable and safe LLM applications with

programmable rails,” in Proc. EMNLP Demo, 2023.

T. Markov et al., “A holistic approach to undesired content detection in the real world,” in Proc. AAAI, 2023.

M. Nasr, N. Carlini, C. Sitawarin, S. Schulhoff, J. Hayes, et al., “The attacker moves second: Stronger adaptive attacks bypass defenses against LLM jailbreaks and prompt injections,” arXiv preprint arXiv:2510.09023, 2025.

F. Tram`er, N. Carlini, W. Brendel, S. Madry, and A. Kurakin, “On

adaptive attacks to adversarial example defenses,” in Proc. NeurIPS, 2020.

Y. Liu, G. Deng, Y. Li, K. Wang, Z. Wang, X. Wang, T. Zhang, Y. Liu,

H. Wang, Y. Zheng, and Y. Liu, “Identifying the risks of LM agents with an LM-emulated sandbox,” in Proc. ICLR, 2024.

S. Datta, S. K. Nahin, A. Chhabra, and P. Mohapatra, “Agentic AI security: Threats, defenses, evaluation, and open challenges,” arXiv

preprint arXiv:2510.23883, 2025.

Y. Liu et al., “Prompt injection attack against LLM-integrated applications,” arXiv preprint arXiv:2306.05499, 2023.

Y. Yao et al., “A survey on large language model (LLM) security

and privacy: The good, the bad, and the ugly,” in High-Confidence Computing, 2024.

Q. Feng, S. R. Kasa, S. K. Kasa, H. Yun, C. H. Teo, and S. B. Bodapati, “Exposing privacy gaps: Membership inference attack on preference data for LLM alignment,” arXiv preprint arXiv:2407.06443, 2024.

W. Fu et al., “Membership inference attacks against fine-tuned large

language models via self-prompt calibration,” in Proc. NeurIPS, 2024.

M. Ohm, H. Plate, A. Sykosch, and M. Meier, “Backstabber’s knife collection: A review of open source software supply chain attacks,” in Proc. DIMVA, 2020.

P. Ladisa et al., “SoK: Taxonomy of attacks on open-source software

supply chains,” in Proc. IEEE S&P, 2023.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИТ конгресс СНЭ

ISSN: 2307-8162