Comparative Study of Word Associations in Social Networks Corpora by means of Distributional Semantics Models for Russian

A. A. Antipenko, O. A. Mitrofanova

Abstract


The paper discusses results of the experiment on automatic extraction of associative relations from corpora of Russian texts from Facebook and Pikabu social networks by means of distributional semantic models. The choice of linguistic data for analysis, namely, social networks texts, is determined by the specificity of polylogic internet-discourse which combines traits of written and colloquial speech. We put forward the hypothesis on the possibility of reproduction of associative test technique in the experiments with distributional semantic models. Experiments were carried out with the help of algorithms and tools of Distributional Semantics. We extracted associations for lexemes expressing key concepts of Russian-specific world view. The procedure was performed by means of Word2Vec (CBOW and Skip-gram) neural network architectures. We carried out linguistic analysis of the output data and compared it with the associations described in the Russian Associative Dictionary, Russian regional association database (Siberia and Fare East) and the Russian Distributional Thesaurus. Results achieved in course of experiments allow to make conclusions on the dynamic of Russian-specific language consciousness of contemporary social network users. We worked out and implemented the procedure of quantitative evaluation of data extracted from different sources. We found evidence on the specialization of lexicographic resources and distributional semantic models as regards paradigmatic and syntagmatic relations. Experimental data allowed to carry out linguistic analysis of contemporary Russian-specific world view of social networks users and to reveal tendencies in its development.

Full Text:

PDF (Russian)

References


Ufimtseva N.V. Jazykovoje soznanije: dinamika i variativnost. M.: Institut jazykoznanija RAN, 2011.

Koltsov S.N., Koltsova O.Ju., Mitrofanova O.A., Shimorina A.S. Interpretatsija semanticheskih svyazej v tekstah russkojazychnogo segmenta Jzivogo Zhurnala na osnove tematicheskoj modeli LDA // Tehnologii informatsionnogo obschestva v nauke, obrazovanii i culture: sbornik nauchnyh statej. Materialy XVII Vserossijskoj objedinennoj konfrencii «Internet i sovremennoje obschestvo» IMS–2014, Sankt-Peterburg, 19–20 nojabrya 2014 г. SPb, 2014. S. 135–142.

Slovar assotsiativnyh norm russkogo jazyka / A.A. Leontjev [i dr.]. M.: Izdatelstvo Moskovskogo universiteta, 1977.

Russkij assotsiativnyj slovar: v 4 t. / Ju.N. Karaulov [i dr.]. M., 1994–1996. Т. 1.

Chastotnyj slovar ruskogo jazyka / pod red. L.N. Zasorinoj. M.: Russkij jazyk, 1977.

Ruskij semanticheskij slovar: Opyt avtomaticheskogo postrojenija tezaurusa: ot ponyatija k slovu / Ju.N. Karaulov [i dr.]. M., Nauka, 1983.

Panchenko A. et al. Human and Machine Judgements for Russian Semantic Relatedness // D. Ignatov et al. (eds.) Analysis of Images, Social Networks and Texts: AIST–2016. Communications in Computer and Information Science. Vol. 661. Springer, Cham, 2017.

Baroni M., Lenci A. Distributional Memory: A General Framework for Corpus-Based Semantics // Computational Linguistics. Vol. 36(4). 2010. P. 673–721.

Rohde D., Gonnerman L., Plaut D. An Improved Model of Semantic Similarity Based on Lexical Co-occurrence // Communications of the ACM. № 8. 2006. P. 627–633.

Jurafsky D., Martin H. Speech and Language Processing (Third Edition Draft). 2017. URL: https://web.stanford.edu/~jurafsky/slp3/

Sahlgren M. The Distributional Hypothesis. From Сontext to Leaning // Distributional Models of the Lexicon in Linguistics and Cognitive Science (Special Issue of the Italian Journal of Linguistics). Rivista di Linguistica. 2008. Vol. 20(1). P. 33−53.

Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space // Proceedings of Workshop at ICLR. 2013.

Kutuzov A., Kuzmenko E. WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models // D. Ignatov et al. (eds.) Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science. Vol. 661. Springer, Cham. 2017.

Panicheva P., Erofeeva A., Ledovaya Ja. Semantic Feature Aggregation for Gender Identification in Russian Facebook // Artificial Intelligence and Natural Language 6th Conference, AINL 2017, St. Petersburg, Russia, September 20–23, 2017, Revised Selected Papers. Communications in Computer and Information Science. Vol. 789. Springer, 2017. P. 3–15.

Panicheva P., Protopopova E., Bukia G., Mitrofanova O. Evaluating Distributional Semantic Models with Russian Noun-Adjective Compositions // Communications in Computer and Information Science (CCIS). vol. 661. Analysis of Images, Social Networks and Texts: 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7–9, 2016, Revised Selected Papers. Springer, Cham, 2017. P. 236–247.

Mitrofanova O.A. Verojatnostnoje modelirovanije tematiki russkojazychnyh korpusov tekstov s ispolzovanijem kompjuternogo instrumenta GenSim // Trudy mezhdunarodnoj konferencii «Korpusnaja lingvistika−2015». SPb.: Izdatelstvo Sankt-Peterburgskogo universiteta, 2015. S. 332–343.

Ufimtseva N.V. Obraz mira russkih: sistemnost i soderzhanije // Jazyk i kultura. М., 2009. S. 98–111.

Lyashevskaja O.N., Sharov S.A. Chastotnyj slovar sovremennogo russkogo jazyka (na materialah Natsionalnogo korpusa russkogo jazyka). М.: Azbukovnik, 2009.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162