Using topic modeling for communities clusterization in the VKontakte social network

Sergey Gorshkov, Eugene Ilyushin, Anastasia Chernysheva, Viacheslav Goiko, Dmitry Namiot

Abstract


Topic modeling is one of the most widely used methods in text analysis. It can be used to select topics as well as to find the topics distributed in each document from the corpus. In this article, we present a method for clustering communities in the social network VKontakte (the most popular Russian social network) using topic modeling. As a communities sample a set of groups for which several students of Tomsk State University are subscribed was selected. There were about 7,000 of them in this set. The article describes the method by which the text corpus was formed, as well as mathematical modeling using two popular classical methods LDA and ARTM. A detailed description of these models, quality assessment criteria, and the main practical techniques used by the authors in training the models are given. The aggregated results of clustering communities by topic are also presented. There are also described a method for expert evaluation of community topics based on visualization of the words that make up the lexical core of the topic.


Full Text:

PDF

References


N. Aydin, “Social network analysis: Literature review,” vol. 9, no. 34, pp. 73–80. [Online]. Available: https://dergipark.org.tr/tr/pub/ajite/issue/54418/740686

J. Beel, B. Gipp, S. Langer, and C. Breitinger, “Researchpaper recommender systems: a literature survey,” vol. 17, no. 4, pp. 305–338. [Online]. Available: http://link.springer.com/10.1007/s0079901501560

A. Korshunov and A. Gomzin, “Tematicheskoe modelirovanie tekstov na estestvennom yazyke,” vol. 23, pp. 215–244. [Online]. Available: https://www.elibrary.ru/item.asp?id=18361454

“Topic models in practice. specialization “machine learning data analysis” on coursera.” [Online]. Available: https://www.coursera.org/lecture/unsupervisedlearning/tiematichieskiiemodielinapraktikieO5QDm

V. Bulatov, “Metody otsenivaniya kachestva i mnogokriterial’noy optimizatsii tematicheskikh modeley v biblioteke TopicNet.” [Online]. Available: https://mipt.ru/upload/medialibrary/c25/bulatov_dissertation_topicnet_signature.pdf

D. Blei, A. Ng, and M. Jordan, “Latent dirichlet allocation,” vol. 3, no. 4, pp. 993–1022. [Online]. Available: https://dl.acm.org/doi/10.5555/944919.944937

T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval SIGIR ’99. ACM Press, pp. 50–57. [Online]. Available: http://portal.acm.org/citation. cfm?doid=312624.312649

S. Kotz, N. Balakrishnan, and N. Johnson, “Chapter 49: Dirichlet and inverted dirichlet distributions,” in Continuous Multivariate Distributions. Volume 1: Models and Applications. [Online]. Available: http: //www.ru.ac.bd/wpcontent/uploads/sites/25/2019/03/201_09_Kotz_

ContinuousMultivariateDistributionsModelsandApplications.pdf

K. V. Vorontsov, “Additive regularization for topic models of text collections,” vol. 89, no. 3, pp. 301–304. [Online]. Available: http://link.springer.com/10.1134/S1064562414020185

K. Vorontsov and A. Potapenko, “Additive regularization of topic models,” vol. 101, no. 1, pp. 303–323. [Online]. Available: http://link.springer.com/10.1007/s1099401454766

S. Kullback and R. A. Leibler, “On information and sufficiency,” vol. 22, no. 1, pp. 79–86. [Online]. Available: http://projecteuclid. org/euclid.aoms/1177729694

D. Newman, J. Lau, K. Grieser, and T. Baldwin, “Automatic evaluation of topic coherence,” pp. 100–108. [Online]. Available: http://dl.acm.org/citation.cfm?id=1857999.1858011

D. Newman, S. Karimi, and L. Cavedon, “External evaluation of topic models,” pp. 11–18.

G. L’Huillier, S. A. Ríos, H. Alvarez, and F. Aguilera, “Topicbased social network analysis for virtual communities of interests in the dark web,” in ACM SIGKDD Workshop on Intelligence and Security Informatics ISIKDD ’10. ACM Press, pp. 1–9. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1938606.1938615

Muon Nguyen, Thanh Ho, and Phuc Do, “Social networks analysis based on topic modeling,” in The 2013 RIVF International Conference on Computing & Communication Technologies Research, Innovation, and Vision for Future (RIVF). IEEE, pp. 119–122. [Online]. Available: http://ieeexplore.ieee.org/document/6719878/

S. S. Lee, T. Chung, and D. McLeod, “Dynamic item recommendation

by topic modeling for social networks,” in 2011 Eighth International Conference on Information Technology: New Generations. IEEE, pp. 884–889. [Online]. Available: http://ieeexplore.ieee.org/document/5945352/

D. Naskar, S. Mokaddem, M. Rebollo, and E. Onaindia, “Sentiment analysis in social networks through topic modeling,” pp. 46–53. [Online]. Available: https://www.aclweb.org/anthology/L161008

D. Sergeev, “Python for topic of VKontakte comments. PyDaCon meetup, 2019.” [Online]. Available: https://youtu.be/MEBjnGaHsmw

Y. Cha and J. Cho, “Social network analysis using topic models,”

in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval SIGIR ’12. ACM Press, p. 565. [Online]. Available: http://dl.acm.org/citation.cfm?doid=2348283.2348360

API of vk.com. [Online]. Available: https://vk.com/dev/openapi

F. Krasnov, “Evaluation of optimal number of topics of topic model: An approach based on the quality of clusters,” vol. 7, no. 2, pp. 8–15. [Online]. Available: http://injoit.org/index.php/j1/article/view/656/659

J. Devlin, M.W. Chang, K. Lee, and K. Toutanova, “BERT: Pretraining of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North. Association for Computational Linguistics, pp. 4171–4186. [Online]. Available: http://aclweb.org/anthology/N191423

D. I. Ignatov, A. Semenov, D. Komissarova, and D. V. Gnatyshak, “Multimodal clustering for community detection,” in Formal Concept Analysis of Social Networks, R. Missaoui, S. O. Kuznetsov, and S. Obiedkov, Eds. Springer International Publishing, pp. 59–96, series Title: Lecture Notes in Social Networks. [Online]. Available: http://link.springer.com/10.1007/9783319641676_4


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162