Methods for Determining Confidential Information in Unstructured Data

Georgy Garbuzov, Sergey Dvoryankin

Abstract


This article addresses methods for identifying (classifying, recognizing) confidential information in unstructured data presented in the form of files transmitted over communication channels or stored in file resources. The emphasis on unstructured data is due to the fact that, on the one hand, it is unstructured data that is of the greatest interest to an attacker in terms of the content of confidential information of a commercial enterprise (trade secret, know-how), and, at the same time, it is unstructured data that is difficult to analyze by the signature algorithms and rules based on regular expressions used today, which are used in modern means of protecting against information leaks.

Full Text:

PDF (Russian)

References


Garbuzov G. Technologies for Protecting Intangible Assets from Confidentiality Attacks. International Journal of Open Information Technologies, 2024, vol. 12, no. 9, pp. 142-149. (In Russ., abstract in Eng.) EDN: CXXTYY

Garbuzov G. Issues in Detecting Confidential Information Leaks in Unstructured Data. International Journal of Open Information Technologies, 2025, vol. 13, no. 4, pp. 26-32. (In Russ., abstract in Eng.) EDN: HJLUXD

Gartner, Consult the Board: Unstructured Data Management. [Online]. Available: https://www.gartner.com/en/documents/4373899

Zarubin A., Smirnov B., Kharitonov S., Denisov D., Main drivers and trends of DLP systems development in the Russian Federation Prikladnaya informatika = Journal of Applied Informatics, 2020, vol.15, no. 3, pp. 75-90. (In Russ., abstract in Eng.) doi: https://doi.org/10.37791/2687-0649-2020-15-3-75-90

Tarmizi S. Named entity recognition for quranic text using rule based approaches. Asia-Pacific Journal of Information Technology & Multimedia. 2022, vol. 11, no. 2, pp. 112-122. doi: https://doi.org/10.17576/apjitm-2022-1102-09

Razdyakonov E.S. Personal data recognition in unstructured texts using neural networks. Engineering journal of Don. 2023, no. 7, pp. 589-605. (In Russ., abstract in Eng.) EDN: MXVMJW

Donglan Liu, Xin Liu, Lei Ma, Yingxian Chang, Rui Wang, Hao Zhang, Hao Yu, Wenting Wang. Research on Leakage Prevention Technology of Sensitive Data based on Artificial Intelligence. In: 2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). Beijing, China: IEEE Computer Society; 2020. pp. 142-145. doi: https://doi.org/10.1109/ICEIEC49280.2020.9152286

Zhu T., Ye D., Wang W., Zhou W., Yu P.S. More Than Privacy: Applying Differential Privacy in Key Areas of Artificial Intelligence. IEEE Transactions on Knowledge and Data Engineering, 2022, vol. 34, no. 6, pp. 2824-2843. doi: https://doi.org/10.1109/TKDE.2020.3014246

Guha A., Samanta D., Banerjee A., Agarwal D. Deep Learning Model for Information Loss Prevention From Multi-Page Digital Documents. IEEE Access, 2021, vol. 9, pp. 80451-80465. doi: https://doi.org/10.1109/ACCESS.2021.3084841

Artyushkina E.S., Skakun O.O., Guz A.R. Using artificial intelligence in DLP systems. Applied economic research, 2023, no. 2, pp. 123-129. doi: https://doi.org/10.47576/2949-1908_2023_2_123

Martinelli F., Marulli F., Mercaldo F., Marrone S., Santone A. Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence. In: 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK: IEEE Computer Society, 2020. pp. 1-8. doi: https://doi.org/10.1109/IJCNN48605.2020.9206801

Williams C. K. I. The Effect of Class Imbalance on Precision-Recall Curves. Neural Computation, 2021, vol. 33, no. 4, pp. 853-857. doi: https://doi.org/10.1162/neco_a_01362

Kim J., Lee C., Chang H. The Development of a Security Evaluation Model Focused on Information Leakage Protection for Sustainable Growth. Sustainability, 2020, vol. 12, issue 24. Article number: 10639. doi: https://doi.org/10.3390/su122410639


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИТ конгресс СНЭ

ISSN: 2307-8162