About AI Red Team

Dmitry Namiot, Elena Zubareva

Abstract


The proliferation of machine learning applications based on large language models (ChatGPT, etc.) has brought attention to a well-known problem in machine learning systems: adversarial attacks. Such attacks are special modifications of data at different stages of the standard machine learning pipeline (training, testing, use), which are designed to either prevent the operation of machine learning systems or achieve the special behavior of such systems required by the attacker. In the latter case, the attacker usually wants to ensure that the trained model reacts in a special way (needed by the attacker) to input data prepared in a certain way. There are also classes of attacks on machine learning models that specifically interrogate running models in order to obtain hidden information used in training the model. All of the above attacks can be implemented quite simply for large language models, which opened the eyes of the business community to a real problem - the cybersecurity of machine learning (artificial intelligence) systems themselves. The answer was the accelerated creation of corporate cybersecurity units that should test artificial intelligence systems - AI Red Teams. The principles of the construction and operation of such teams are discussed in this article.

Full Text:

PDF (Russian)

References


Google's AI Red Team: the ethical hackers making AI safer https://blog.google/technology/safety-security/googles-ai-red-team-the-ethical-hackers-making-ai-safer/ Retrieved: 07.09.2023.

Microsoft AI Red Team building future of safer AI https://www.microsoft.com/en-us/security/blog/2023/08/07/microsoft-ai-red-team-building-future-of-safer-ai/ Retrieved: 07.09.2023.

OpenAI’s red team: the experts hired to ‘break’ ChatGPT https://archive.is/xu0wS#selection-1437.0-1437.55 Retrieved: 07.09.2023.

Ge, Yingqiang, et al. "Openagi: When llm meets domain experts." arXiv preprint arXiv:2304.04370 (2023).

Ilyushin, Eugene, Dmitry Namiot, and Ivan Chizhov. "Attacks on machine learning systems-common problems and methods." International Journal of Open Information Technologies 10.3 (2022): 17-22. (in Russian)

Namiot, Dmitry. "Schemes of attacks on machine learning models." International Journal of Open Information Technologies 11.5 (2023): 68-86. (in Russian)

Namiot, Dmitry, and Eugene Ilyushin. "On the robustness and security of Artificial Intelligence systems." International Journal of Open Information Technologies 10.9 (2022): 126-134. (in Russian)

Namiot, Dmitry. "Introduction to Data Poison Attacks on Machine Learning Models." International Journal of Open Information Technologies 11.3 (2023): 58-68. (in Russian)

Kostyumov, Vasily. "A survey and systematization of evasion attacks in computer vision." International Journal of Open Information Technologies 10.10 (2022): 11-20.(in Russian)

Mozes, Maximilian, et al. "Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities." arXiv preprint arXiv:2308.12833 (2023).

Democratic inputs to AI https://openai.com/blog/democratic-inputs-to-ai Retrieved: 08.09.2023

Securing AI The Next Platform Opportunity in Cybersecurity https://greylock.com/greymatter/securing-ai/ Retrieved: 08.09.2023

Namiot, Dmitry, et al. "Information robots in enterprise management systems." International Journal of Open Information Technologies 5.4 (2017): 12-21. (in Russian)

Compromised PyTorch-nightly dependency chain between December 25th and December 30th, 2022 https://pytorch.org/blog/compromised-nightly-dependency/ Retrieved: 08.09.2023

OpenAI Reveals Redis Bug Behind ChatGPT User Data Exposure https://thehackernews.com/2023/03/openai-reveals-redis-bug-behind-chatgpt.html Incident Retrieved: 08.09.2023

Bidzhiev, Temirlan, and Dmitry Namiot. "Research of existing approaches to embedding malicious software in artificial neural networks." International Journal of Open Information Technologies 10.9 (2022): 21-31. (in Russian)

Gao, Yansong, et al. "Backdoor attacks and countermeasures on deep learning: A comprehensive review." arXiv preprint arXiv:2007.10760 (2020).

Kalin, Josh, David Noever, and Matthew Ciolino. "Color Teams for Machine Learning Development." arXiv preprint arXiv:2110.10601 (2021).

MLSecOps https://mlsecops.com/ Retrieved: 08.09.2023

Namiot, Dmitry, and Eugene Ilyushin. "Data shift monitoring in machine learning models." International Journal of Open Information Technologies 10.12 (2022): 84-93. (in Russian)

Namiot, Dmitry, Eugene Ilyushin, and Oleg Pilipenko. "On Trusted AI Platforms." International Journal of Open Information Technologies 10.7 (2022): 119-127. (in Russian)

Liu, Yi, et al. "Prompt Injection attack against LLM-integrated Applications." arXiv preprint arXiv:2306.05499 (2023).

Wong, Sheng, et al. "MLGuard: Defend Your Machine Learning Model!." arXiv preprint arXiv:2309.01379 (2023).

Song, Junzhe, and Dmitry Namiot. "A Survey of the Implementations of Model Inversion Attacks." International Conference on Distributed Computer and Communication Networks. Cham: Springer Nature Switzerland, 2022.

Zou, Andy, et al. "Universal and transferable adversarial attacks on aligned language models." arXiv preprint arXiv:2307.15043 (2023).

Compromising LLMs using Indirect Prompt Injection https://github.com/greshake/llm-security Retrieved: 08.09.2023

Introducing Google’s Secure AI Framework https://blog.google/technology/safety-security/introducing-googles-secure-ai-framework/ Retrieved: 08.09.2023

Secure AI Framework Approach. A quick guide to implementing the Secure AI Framework (SAIF) https://services.google.com/fh/files/blogs/google_secure_ai_framework_approach.pdf Retrieved: 11.09.2023

Securing AI Pipeline https://www.mandiant.com/resources/blog/securing-ai-pipeline Retrieved: 11.09.2023

Atlas MITRE https://atlas.mitre.org/ Retrieved: 11.09.2023

ATLAS mitigations https://atlas.mitre.org/mitigations/ Retrieved: 11.09.2023

Introduction to red teaming large language models (LLMs) https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/red-teaming Retrieved: 11.09.2023

OpenAI’s red team: the experts hired to ‘break’ ChatGPT https://www.ft.com/content/0876687a-f8b7-4b39-b513-5fee942831e8 Retrieved: 11.09.2023

Microsoft AI Red Team building future of safer AI https://www.microsoft.com/en-us/security/blog/2023/08/07/microsoft-ai-red-team-building-future-of-safer-ai/ Retrieved: 13.09.2023

Bug-bar https://learn.microsoft.com/ru-ru/security/engineering/bug-bar-aiml Retrieved: 13.09.2023

AI-Security-Risk-Assessment https://github.com/Azure/AI-Security-Risk-Assessment/blob/main/AI_Risk_Assessment_v4.1.4.pdf Retrieved: 13.09.2023

AI threat modeling https://learn.microsoft.com/ru-ru/security/engineering/threat-modeling-aiml Retrieved: 13.09.2023

Responsible https://www.microsoft.com/en-us/ai/responsible-ai AI Retrieved: 13.09.2023

Failure modes taxonomy https://learn.microsoft.com/ru-ru/security/engineering/failure-modes-in-machine-learning Retrieved: 13.09.2023

NVIDIA AI Red Team: An Introduction https://developer.nvidia.com/blog/nvidia-ai-red-team-an-introduction/ Retrieved: 13.09.2023

Microsoft Counterfit https://github.com/Azure/counterfit/ Retrieved: 13.09.2023

GARD project https://www.gardproject.org/ Retrieved: 13.09.2023

Master program Cybersecurity https://cyber.cs.msu.ru/ Retrieved: 13.09.2023


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162