Results of automatic mining individual fields of personal data operators register

P. Yu. Pushkin, A.M. Rusakov


The work presents the results of mining the records contained in the fields of the register personal data operators "list of actions with personal data" and "period or condition of termination personal data processing" and assessment of their compliance with the requirements of the legislation on personal data. Higher educational institutions were chosen as the research operator community, which allows taking into account similar features of personal data processing when forming expert assessments and intelligent data analysis. For the purpose of the study, a body of texts has been formed that can be used to analyze data mining methods by information processing and protection topics. For text mining, the following libraries were used: Scikit-learn, Gensim, PyMystem3, FuzzyWuzzy. Search queries were performed taking into account synonym dictionaries and the fuzzy location of words. To find stable keyword combinations, the TF-IDF weight function was calculated. Comparison of methods lemmatization of words for research purposes was made. The obtained results show the fidelity of expert assessments on filling in the fields of the register of personal data operators: the maximum cluster determined by the results of mining analysis corresponds to the expert template. The results of automatic mining require the verification of an expert in the field of personal data processing and protection. The use of data mining methods makes it possible to significantly increase the efficiency of experts when working with large volumes of information contained in the register of personal data operators. The work is aimed at forming separate sections of recommendations for the development of a sectoral (in the field of higher education and science) code of conduct in the field of protection of the rights of personal data subjects in order to increase the level of security of such information.

Full Text:

PDF (Russian)


Report on the activities of the Authorized Body for the Protection of the Rights of Personal Data Subjects for 2019,

Methodological recommendations on notification of the authorized body on the beginning of personal data processing and on amendments to previously submitted information, "appendix to the order of Roskomnadzor dated 30.05.2017 No. 94,

Federal law of 27.07.2006 No. 152-FZ "About Personal Data",

Methodological recommendations for the development of an industry code of conduct in the field of protection of the rights of personal data subjects,

Register of operators processing personal data,

Notification form. Personal data portal of the authorized body for protection of rights of personal data subjects,

Los V.P., Nikulchev E.V., Pushkin P.Yu., Rusakov A.M. Information and analytical system for monitoring the compliance of personal data operators with the requirements of the legislation//Problems of information security. Computer systems. 2020. No. 3. P. 16-23.

Moskalenko A. A., Laponina O. R., Sukhomlin V. A. Development of a web-scraping application with blocking bypass capabilities//Modern information technologies and IT education. 2019. No. 2. P. 413-420.

Open data portal of the Russian Federation,

Register of organizations carrying out educational activities under state accredited educational programs,

Official website on higher education in Russia for foreign students,

LCC «Argumenty i Fakty»,

Provision of information from the EGRUL/EGRIP,

Jurafsky D., Martin J. H. Speech and language processing

(August 2020),

Mitrena O. V., Nikolaev I. S., Lando T. M. Applied and computer linguistics, 2016. P.360.

Bengfort B., Bilbro R., Ojeda T. Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning. – " O'Reilly Media, Inc.", 2018.

Silaeva A. E., Nikulchev E. V., Ilyin D. Yu., Maly S. B. Processing open questions of web surveys in the education system based on artificial intelligence methods. // Conference: 13 International Conference "Managing the Development of Large-Scale Systems" – (MLSD’2020) At: Moscow, Russia Volume: Pp. 1692-1697

Parkhomenko P.A., Grigoryev A.A., Astrakhantsev N.A. Review and experimental comparison of clustering methods of texts//Proceedings of the ISP RAS. 2017. No. 2. P. 161-200.

Free online dictionary of Russian synonyms,

Civil Code of the Russian Federation: Part One - Fourth: [Adopted by the State. The Duma on April 23, 1994, with amendments and additions as of December 03, 2020]//Assembly of Legislation of the Russian Federation. 1994. No. 22. Article. 2457.


  • There are currently no refbacks.

Abava  Absolutech Convergent 2020

ISSN: 2307-8162