Applying Deep Learning to Identify and Classify DGA Domains

E. Diuldin, K.S. Zaytsev

Abstract


The purpose of this work is to study methods for detecting malicious domains generated using DGA algorithms. To solve this problem, proposed to create a deep learning architecture based on the Tensorflow framework, and implement layers based on the Keras library. For training, testing and validation of the resulting architecture, data generated based on 25 known DGA generation algorithms and legitimate data obtained from Top Alexa were used. Using the data obtained, the proposed neural network architecture was compared with the known implementations of machine learning architectures according to the classification of DGA domains. The target metric by which the quality of the classification was compared was the f-measure with the parameter - the weight of accuracy in the metric (β) equal to 0.4, which made it possible to choose the model with the highest prediction accuracy. The results obtained confirmed the effectiveness of the proposed solution. The result of the work was the creation of an effective machine learning architecture used to classify malicious DGA domains.


Full Text:

PDF (Russian)

References


Sengupta, T.K., Lestandi, L., Haider, S.I. et al. Correction to: Reduced order model of flows by time-scaling interpolation of DNS data. Adv. Model. and Simul. in Eng. Sci. 5, 27 (2018). https://doi.org/10.1186/s40323-018-0120-9

Yang, L, Liu G, Zhai J, Dai Y, Yan Z, Zou Y, Huang W (2018) A novel detection method for word-based dga In: International Conference on Cloud Computing and Security, 472–483.. Springer, Haikou.

Wang, W, Shirley K (2015) Breaking bad: Detecting malicious domains using word segmentation. arXiv e-prints:arXiv:1506.04111. https://ui.adsabs.harvard.edu/abs/2015arXiv150604111W.

Wang, Q., Feng, C., Xu, Y. et al. A novel privacy-preserving speech recognition framework using bidirectional LSTM. J Cloud Comp 9, 36 (2020). https://doi.org/10.1186/s13677-020-00186-7.

ArunKumar KE, Kalaga DV, Kumar ChMS, Kawaji M, Brenza TM. Forecasting of COVID-19 using deep layer recurrent neural networks (RNNs) with gated recurrent units (GRUs) and long short-term memory (LSTM) cells. Chaos Solitons Fractals. 2021;146:110861.

Leevy, J.L., Khoshgoftaar, T.M. & Villanustre, F. Survey on RNN and CRF models for de-identification of medical free text. J Big Data 7, 73 (2020). https://doi.org/10.1186/s40537-020-00351-4.

Touzani, Y., Douzi, K. An LSTM and GRU based trading strategy adapted to the Moroccan market. J Big Data 8, 126 (2021). https://doi.org/10.1186/s40537-021-00512-z

Burgess, J, Carlin D, O’Kane P, Sezer S (2020) REdiREKT: Extracting Malicious Redirections from Exploit Kit Traffic In: 2020 IEEE Conference on Communications and Network Security (CNS).. IEEE.

Duncan, B (2020) Malware Traffic Analysis. https://www.malware-traffic-analysis.net/. Accessed 7 May 2021.

Lu, Q., Sun, S., Duan, H. et al. Analysis and forecasting of crude oil price based on the variable selection-LSTM integrated model. Energy Inform 4, 47 (2021). https://doi.org/10.1186/s42162-021-00166-4


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162