System for collecting and analyzing information from various sources in Big Data conditions

D. V. Smirnov, A. A. Grusho, M. I. Zabezhailo, E. E. Timonina

Abstract


The problem of constructing an architecture and methods of searching for insider activity signs in process-real-time conditions has been investigated. The problem is solved in the following conditions. The source of "raw" data is Big Data, from which data relevant to insider activity signs is selected in accordance to the current list of threats. Search is conducted on a large number of users. Under these conditions, the algorithm is built that breaks the difficulty barrier in finding relevant data. The important complication of the task is the conditions of "openness" of the data. The condition of "openness" of the data involves constant updating of the data. The concept of "openness" also includes changing the signs of hostile activities of insiders. In this case, the search conditions can also dynamically change. The built architecture is two-level. The first level contains data collected from various "raw" databases and relevant to the current list of threats. The second level relates to the maximum availability of data organized at the first level for analysis with the participation of experts - operational workers. Scientific justification of correctness and efficiency of mathematical models and big data mining algorithms involved in implementation of this software system is given. The built solutions showed their operability in the industrial version of the solution of the problem.


Full Text:

PDF (Russian)

References


Welcome to Apache Lucene. Available: https://lucene.apache.org.

The Apache Software Foundation. Available: https://lucy.apache.org.

O. Bartunov, “Do you need a Full-Text Search in PostgreSQL?”, in PGConf.eu, Oct 26, Lisbon, 84 p., 2018. Available: https://www.postgresql.eu/events/pgconfeu2018/sessions/session/2116/slides/137/pgconf.eu-2018-fts.pdf.

Open-source database for search applications. Available: https://manticoresearch.com.

INDRI: Language modeling meets inference networks. Available: https://www.lemurproject.org/indri/.

MySQL: Full-Text Search Functions. Available: https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html.

Welcome to the Terrier IR Platform. Available: http://terrier.org.

NLP-Center: NoSketch Engine. Available: https://nlp.fi.muni.cz/trac/noske.

ArangoDB: Powerful Search Included. Available: https://www.arangodb.com/full-text-search-engine/.

LUNR: Search made simple. Available: https://lunrjs.com.

Xapian: Open Source Search Engine Library. https://xapian.org.

M. I. Zabezhailo, “To the some new possibilities to control computational complexity of hypotheses,” Scientific and Technical Information Processing, Part I: no. 1, pp. 95-110, Part II: no. 3, pp. 3-21, 2014.

A. A. Grusho, M. I. Zabezhailo, A. A. Zatsarinny, E. E. Timonina, “On some possibilities of resource management for organizing active counteraction to computer attacks,” Informatics and Applications, vol. 12, no. 1, pp. 62-70, 2018.

M. I. Zabezhailo, “To the computational complexity of hypotheses generation in JSM-method,” Scientific and Technical Information Processing, Part I: no. 1, С. 3-17, Часть II: no. 2. С. 3-17, 2015.

A. A. Grusho, N. A. Grusho, M. I. Zabezhailo, D. V. Smirnov, E. E. Timonina, “About complex authentication,” Systems and Means of Informatics, vol. 27, no. 3, pp. 3-10, 2017.

A. A. Grusho, M. I. Zabezhailo, D. V. Smirnov, E. E. Timonina, “The model of the set of information spaces in the problem of insider detection,” Informatics and Applications, vol. 11, no. 4, pp. 65-69, 2017.

A. A. Grusho, N. A. Grusho, M. I. Zabezhailo, D. V. Smirnov, E. E. Timonina, “Parametrization in Applied Problems of Search of

the Empirical Reasons,” Informatics and Applications, vol. 12, no. 3, pp. 62-66, 2018.

A. A. Grusho, M. I. Zabezhailo, D. V. Smirnov, E. E. Timonina, S. Ya. Shorgin, “Mathematical statistics in the task of identifying hostile insiders,” Informatics and Applications, vol. 14, no. 3, pp. 71-75, 2020.

A. A. Grusho, M. I. Zabezhailo, D. V. Smirnov, E. E. Timonina, “On probabilistic estimates of the validity of empirical conclusions,” Informatics and Applications, vol. 14, no. 4, pp. 3-8, 2020.


Refbacks

  • There are currently no refbacks.


Abava  Absolutech Convergent 2020

ISSN: 2307-8162