Attack Methods and Defenses in LLM-Based Agentic Systems
Abstract
LLM-based agentic systems represent a new class of autonomous software capable of planning and executing multi-step tasks using external tools, long-term memory, and inter-agent communication. The transition from chatbots to autonomous agents introduces fundamentally novel attack surfaces not captured by classical threat models. This survey systematizes the current state of LLM-agent security research. We propose an extended threat taxonomy divided into seven classes: (1) prompt injection attacks; (2) memory attacks; (3) tool and protocol attacks; (4) multiagent attacks; (5) multi-modal attacks; (6) tool chain and supply chain attacks; (7) temporal attacks. Defense methods are systematized by intervention level: textual (injection filtering and detection), model-level (analysis of internal representations and activations), tool-level (privilege control and call policies), protocol-level (MCP security extensions), firewall (agentic firewalls), and systemic (formal policy verification and cryptographic approaches). Existing indirect prompt injection defenses remain vulnerable to adaptive attacks, raising the question of evaluating defenses against adaptive adversaries as a standard practice. Analysis of real-world security incidents – from orchestration platform takeovers to data exfiltration via inter-agent trust – validates the practical significance of theoretical threat models. Attacks with cross-session persistence, where compromise survives agent session boundaries, are particularly acute.
Full Text:
PDF (Russian)References
Injection (CVE-2025-53773),” 2025.
CSO Online, “Critical RCE Flaw Allows Full Takeover of n8n AI Workflow Platform (CVE-2026-21858),” 2026.
Wang, X. and others, “The Landscape of Prompt Injection Threats in LLM Agents,” arXiv:2602.10453, 2026.
Shi, Y. and others, “SoK: Trust-Authorization Mismatch in LLM Agent Interactions,” arXiv:2512.06914, 2025.
Jones, E. and others, “A Systematization of Security Vulnerabilities in Computer Use Agents,” arXiv:2507.05445, 2025.
Meta AI, “LlamaFirewall: Open Source Guardrail System for Secure AI Agents,” arXiv:2505.03574, 2025.
Ji, Z. and others, “Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks,” arXiv:2511.15203, 2025.
Zhan, Q. and others, “Adaptive Attacks Break Defenses Against Indirect Prompt Injection,” NAACL 2025 Findings, 2025. arXiv:2503.00061.
Jiang, H. and others, “SoK: Agentic Skills — Beyond Tool Use in LLM Agents,” arXiv:2602.20867, 2026.
Betser, N. and others, “AgenTRIM: Tool Risk Mitigation for Agentic AI,” arXiv:2601.12449, 2026.
Shi, L. and others, “Progent: Programmable Privilege Control for LLM Agents,” arXiv:2504.11703, 2025.
Chang, Y. and others, “Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild,” arXiv:2601.07072, 2026.
Srivastava, A. and others, “MemoryGraft: Persistent Compromise via Poisoned Experience Retrieval,” arXiv:2512.16962, 2025.
Sunil, R. and others, “Memory Poisoning Attack and Defense on Memory Based LLM-Agents,” arXiv:2601.05504, 2026.
Maloyan, A. and Namiot, D., “Breaking the Protocol: Security Analysis of the Model Context Protocol,” arXiv:2601.17549, 2026.
Hou, Y. and others, “SMCP: Secure Model Context Protocol,” arXiv:2602.01129, 2026.
Naik, A. and others, “OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage,” arXiv:2602.13477, 2026.
Lan, M. and others, “Silent Egress: Implicit Prompt Injection Makes LLM Agents Leak,” arXiv:2602.22450, 2026.
Wang, L. and others, “AdapTools: Adaptive Tool-based Indirect Prompt Injection Attacks,” arXiv:2602.20720, 2026.
Yang, K. and others, “Zombie Agents: Persistent Control via Self-Reinforcing Injections,” arXiv:2602.15654, 2026.
Shi, J. and others, “ToolHijacker: Prompt Injection Attack to Tool Selection,” arXiv:2504.19793, 2025.
“Log-To-Leak: Prompt Injection via MCP,” OpenReview:UVgbFuXPaO, 2025.
He, J. and others, “Agent-in-the-Middle: Red-Teaming Multi-Agent Systems via Communication Attacks,” arXiv:2502.14847, 2025.
Triedman, H. and Jha, R. and Shmatikov, V., “Multi-Agent Systems Execute Arbitrary Malicious Code,” arXiv:2503.12188, 2025.
Cui, Y. and Du, H., “MAD-Spear: Conformity-Driven Prompt Injection on Multi-Agent Debate,” arXiv:2507.13038, 2025.
Lupinacci, L. and others, “The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover,” arXiv:2507.06850, 2025.
Cheng, X. and others, “CrossInject: Cross-Modal Prompt Injection,” ACM Multimedia 2025, 2025. arXiv:2504.14348.
Wu, C. H. and others, “Dissecting Adversarial Robustness of Multimodal LM Agents,” ICLR 2025, 2025. arXiv:2406.12814.
Cartagena, A. and Teixeira, A., “Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety,” arXiv:2602.16943, 2026.
Amazon Science, “STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents,” arXiv:2509.25624, 2025.
Narajala, V. S. and others, “Tool Squatting: Zero Trust Registry-Based Approach,” arXiv:2504.19951, 2025.
Jiaxiaojun and others, “SkillJect: Automated Skill-Based Prompt Injection for Coding Agents,” arXiv:2602.14211, 2026.
“Agentic AI as a Cybersecurity Attack Surface: Runtime Supply Chains,” arXiv:2602.19555, 2026.
Lilienthal, D. and Hong, S., “Mind the Gap: TOCTOU Vulnerabilities in LLM-Enabled Agents,” arXiv:2508.17155, 2025.
Shi, T. and others, “PromptArmor: Simple yet Effective Prompt Injection Defense,” arXiv:2507.15219, 2025.
Abdelnabi, S. and others, “Firewalls to Secure Dynamic LLM Agentic Networks,” arXiv:2502.01822, 2025.
Palumbo, N. and others, “PCAS: Policy Compiler for Secure Agentic Systems,” arXiv:2602.16708, 2026.
Wang, R. and others, “ICON: Indirect Prompt Injection Defense via Inference-Time Correction,” arXiv:2602.20708, 2026.
Zhang, T. and others, “AgentSentry: Mitigating Indirect Prompt Injection via Temporal Causal Diagnostics,” arXiv:2602.22724, 2026.
“ARGUS: Defending Against Multimodal IPI via Activation Steering,” arXiv:2512.05745, 2025.
Wang, Y. and others, “AgentArmor: Program Analysis on Agent Runtime Trace,” arXiv:2508.01249, 2025.
“MindGuard: Decision Inspection Against Metadata Poisoning,” arXiv:2508.20412, 2025.
Lin, J. and others, “VIGIL: Defending LLM Agents Against Tool Stream Injection,” arXiv:2601.05755, 2026.
Jha, R. and others, “Breaking and Fixing Defenses Against Control-Flow Hijacking in MAS,” arXiv:2510.17276, 2025.
NeuralTrust, “Generative Application Firewall (GAF),” arXiv:2601.15824, 2026.
Rajagopalan, M. and Rao, V., “Authenticated Workflows: A Systems Approach,” arXiv:2602.10465, 2026.
“AgentBound: Access Control for MCP Servers,” arXiv:2510.21236, 2025.
Debenedetti, E. and others, “AgentDojo: Dynamic Environment for Prompt Injection Evaluation,” arXiv:2406.13352, 2024.
Zhang, H. and others, “Agent Security Bench (ASB),” ICLR 2025, 2025. arXiv:2410.02644.
Bazinska, J. and others, “Breaking Agent Backbones (b3 Benchmark),” ICLR 2026, 2026. arXiv:2510.22620.
“AgentLAB: Long-Horizon Attack Benchmark,” arXiv:2602.16901, 2026.
Wang, Y. and Gao and others, “MCPTox: Benchmark for Tool Poisoning on Real-World MCP Servers,” arXiv:2508.14925, 2025.
Dong, J. and others, “SafeSearch: Red-Teaming LLM Search Agents,” arXiv:2509.23694, 2025.
Cyata, “LangChain ‘LangGrinch’ (CVE-2025-68664),” 2025.
CVE Reports, “Langflow CSV Agent RCE (CVE-2026-27966),” 2026.
Sombra Inc., “ServiceNow Now Assist Privilege Escalation via Second-Order Prompt Injection,” 2026.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность Monetec 2026 СНЭ
ISSN: 2307-8162