What LLM Knows About Cybersecurity

Dmitry Namiot

What LLM Knows About Cybersecurity

Dmitry Namiot

Abstract

The article is devoted to testing large language models (LLM). Cybersecurity knowledge is chosen as the subject of testing. The work provides an overview of test datasets (benchmarks) that can be used to test LLM knowledge in the field of cybersecurity. Technically, these are tens of thousands of questions covering a wide variety of areas: monitoring computer networks and planning their topology, conducting network analysis, creating reports and quickly finding and eliminating network faults to ensure network stability, managing network devices, testing network equipment (such as switches, routers, firewalls, etc.), troubleshooting network problems, optimizing network performance, network security, backup and recovery, identity and access management, IoT security, cryptography, wireless network security, cloud security, penetration testing and auditing, vulnerabilities in software code. The issue of constructing such tests is also considered.

Full Text:

PDF (Russian)

References

Introducing ChatGPT https://openai.com/index/chatgpt/ .Retrieved: Mar, 2025

Namiot, D. E., E. A. Il'jushin, and I. V. Chizhov. "Iskusstvennyj intellekt i kiberbezopasnost'." International Journal of Open Information Technologies 10.9 (2022): 135-147.

Namiot, D. E. "O kiberatakah s pomoshh'ju sistem iskusstvennogo intellekta." International Journal of Open Information Technologies 12.9 (2024): 132-141.

Namiot, D. E., and E. A. Il'jushin. "Iskusstvennyj intellekt v kiberbezopasnosti: poisk vredonosnogo programmnogo obespechenija." International Journal of Open Information Technologies 12.6 (2024): 143-149.

Bethany, Mazal, et al. "Large language model lateral spear phishing: A comparative study in large-scale organizational settings." arXiv preprint arXiv:2401.09727 (2024).

Ghimire, Ashutosh, et al. "Enhancing Cybersecurity in Critical Infrastructure with LLM-Assisted Explainable IoT Systems." arXiv preprint arXiv:2503.03180 (2025).

Zhang, Jie, et al. "When llms meet cybersecurity: A systematic literature review." Cybersecurity 8.1 (2025): 1-41.

Motlagh, Farzad Nourmohammadzadeh, et al. "Large language models in cybersecurity: State-of-the-art." arXiv preprint arXiv:2402.00891 (2024).

Ferrag, Mohamed Amine, et al. "Generative ai and large language models for cyber security: All insights you need." Available at SSRN 4853709 (2024).

Yamin, Muhammad Mudassar, et al. "Applications of llms for generating cyber security exercise scenarios." IEEE Access (2024).

Namiot, D. E., and E. A. Il'jushin. "Arhitektura LLM agentov." International Journal of Open Information Technologies 13.1 (2025): 67-74.

Fang, Richard, et al. "Llm agents can autonomously exploit one-day vulnerabilities." arXiv preprint arXiv:2404.08144 13 (2024): 14.

Yamin, Muhammad Mudassar, et al. "Applications of llms for generating cyber security exercise scenarios." IEEE Access (2024).

Namiot, D. E., and E. A. Il'jushin. "O kiberriskah generativnogo Iskusstvennogo Intellekta." International Journal of Open Information Technologies 12.10 (2024): 109-119.

Namiot, D. E., and E. V. Zubareva. "O rabote AI Red Team." International Journal of Open Information Technologies 11.10 (2023): 130-139.

Zhang, Boning, Chengxi Li, and Kai Fan. "MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit." arXiv preprint arXiv:2404.13925 (2024).

Kapoor, Sayash, and Arvind Narayanan. "Leakage and the reproducibility crisis in machine-learning-based science." Patterns 4.9 (2023).

Copy-and-Paste Programming https://effectivesoftwaredesign.com/2016/05/22/copy-and-paste-programming/ Retrieved: Mar, 2025

Bhusal, Dipkamal, et al. "Secure: benchmarking generative large language models for cybersecurity advisory." arXiv e-prints (2024): arXiv-2405.

Techniques - ics — mitre att&ck®. https://attack.mitre.org/techniques/ics/. Retrieved: Mar, 2025

C. Project, “Cves published in 2024,” https://github.com/CVEProject/cvelistV5/tree/main/cves/2024. Retrieved: Mar, 2025

“Cwe-1358: Weaknesses in sei etf categories of security vulnerabilities in ics,” 2024, https://cwe.mitre.org/data/definitions/1358.html. Retrieved: Mar, 2025

C. A. . Advisories, “Cybersecurity and infrastructure security agency,” 2024, available at https://www.cisa.gov/news-events/cybersecurity-advisories

Miao, Y.; Bai, Y.; Chen, L.; Li, D.; Sun, H.; Wang, X.; Luo, Z.; Ren, Y.; Sun, D.; Xu, X.; et al. An empirical study of netops capability of pre-trained large language models. arXiv 2023, arXiv:2309.05557. [Google Scholar]

NASP neteval-exam https://huggingface.co/datasets/NASP/neteval-exam Retrieved: Mar 2025

Liu, Zefang. "Secqa: A concise question-answering dataset for evaluating large language models in computer security." arXiv preprint arXiv:2312.15838 (2023).

Tolboom, Ryan. "Computer Systems Security." (2023).

SecQA https://huggingface.co/datasets/zefang-liu/secqa Retrieved: Mar, 2025

Li, G.; Li, Y.; Guannan, W.; Yang, H.; Yu, Y. SecEval: A Comprehensive Benchmark for Evaluating Cybersecurity Knowledge of Foundation Models. https://github.com/XuanwuAI/SecEval Retrieved: Mar, 2025

Tihanyi, Norbert, et al. "CyberMetric: a benchmark dataset based on retrieval-augmented generation for evaluating LLMs in cybersecurity knowledge." 2024 IEEE International Conference on Cyber Security and Resilience (CSR). IEEE, 2024.

Cybermetric https://github.com/cybermetric Retrieved: Mar, 2025

Yu, Zhengmin, et al. "CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity." arXiv preprint arXiv:2411.16239 (2024).

CS-Eval https://github.com/CS-EVAL/CS-Eval Retrieved: Mar, 2025

Tian, Runchu, et al. "Debugbench: Evaluating debugging capability of large language models." arXiv preprint arXiv:2401.04621 (2024).

LeetCode https://leetcode.com/ Retrieved: Mar, 2025

DebugBench https://huggingface.co/datasets/Rtian/DebugBench Retrieved: Mar, 2025

Siddiq, Mohammed Latif, and Joanna CS Santos. "SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques." Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security. 2022.

SecurityEval: https://github.com/s2e-lab/SecurityEval Retrieved: Mar, 2025

Alrashedy, Kamel, et al. "Can LLMs Patch Security Issues?." arXiv preprint arXiv:2312.00024 (2023).

Bandit https://github.com/PyCQA/bandit Retrieved: Mar, 2025

Chauvin, Timothee. "eyeballvul: a future-proof benchmark for vulnerability detection in the wild." arXiv preprint arXiv:2407.08708 (2024).

Deka, Pritam, et al. "Attacker: towards enhancing cyber-attack attribution with a named entity recognition dataset." International Conference on Web Information Systems Engineering. Singapore: Springer Nature Singapore, 2024.

Dasgupta, Soham, et al. "A comparative study of deep learning based named entity recognition algorithms for cybersecurity." 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020.

Yigit, Yagmur, et al. "Generative AI and LLMs for critical infrastructure protection: evaluation benchmarks, agentic AI, challenges, and opportunities." Sensors 25.6 (2025): 1666.

COSMICENERGY: New OT Malware https://cloud.google.com/blog/topics/threat-intelligence/cosmicenergy-ot-malware-russian-response/ Retrieved: Mar, 2025

Suhomlin, Vladimir Aleksandrovich. "Koncepcija i osnovnye harakteristiki magisterskoj programmy" Kiberbezopasnost'" fakul'teta VMK MGU." International Journal of Open Information Technologies 11.7 (2023): 143-148.

O rabotah po cifrovoj jekonomike / V. P. Kuprijanovskij, D. E. Namiot, S. A. Sinjagov, A. P. Dobrynin // Sovremennye informacionnye tehnologii i IT-obrazovanie. – 2016. – T. 12, # 1. – S. 243-249. – EDN XEQRFJ.

Razvitie transportno-logisticheskih otraslej Evropejskogo Sojuza: otkrytyj BIM, Internet Veshhej i kiber-fizicheskie sistemy / V. P. Kuprijanovskij, V. V. Alen'kov, A. V. Stepanenko [i dr.] // International Journal of Open Information Technologies. – 2018. – T. 6, # 2. – S. 54-100. – EDN YNIRFG.

Umnaja infrastruktura, fizicheskie i informacionnye aktivy, Smart Cities, BIM, GIS i IoT / V. P. Kuprijanovskij, V. V. Alen'kov, I. A. Sokolov [i dr.] // International Journal of Open Information Technologies. – 2017. – T. 5, # 10. – S. 55-86. – EDN ZISODV.

Refbacks

There are currently no refbacks.

Abava Кибербезопасность Monetec 2026 СНЭ

ISSN: 2307-8162