Semantic Classification of Russian Prepositional Phrases with Transformer Embeddings

A. V. Belyi, D. V. Boitsova, E. A. Botvineva, V. V. Vybornaya, A. M. Goncharova, O. A. Mitrofanova, A. A. Rodina

Abstract


The article describes frequency characteristics of the preposition's ratio and their meanings in the database of Russian prepositions and considers the task of creating an effective semantic classifier of prepositional phrases trained and testes on the dataset. The database of Russian prepositions discussed in the article was created within the framework of the project ‘Quantitative Grammar of Russian Prepositional Constructions’ developed at the Department of Mathematical Linguistics of Saint Petersburg State University. The study was also based on a corpus of 200 syntactically ambiguous sentences described in D.A. Chernova’s doctoral research “The Process of Processing Syntactically Ambiguous Sentences: A Psycholinguistic Study”. In the present work a novel tree-based classifier architecture consisting of a main multiclass classifier and a supportive binary classifier is proposed. This architecture significantly improves performance compared to previous work, both in overall and on previously troublesome highly confused classes. Experiments were conducted with different types of classifiers and various embedding models for the Russian language used for encoding the dataset. The best solution provides F1-score of 0,76 leveraging SVM classifiers and a DeepPavlov/rubert-base-cased model.

Full Text:

PDF (Russian)

References


I.V. Azarova, V.P. Zakharov & A.D. Moskvina, “Semantic Structure of Russian Prepositional Constructions,” Computernaja lingvistika i vychislitelnye ontologii – Internet i sovremennoe obshhestvo: trudy XXI Mezhdunarodnoj obedinennoj konferencii (Sankt-Peterburg, 30 maja – 2 iunja 2018 g., vypusk 2). SPb. № 2. P. 9–16. 2018.

G.A. Zolotova, “Syntactic dictionary: Repertory of elementary units of Russian Syntax,”. Moscow: Nauka, 1988. (In Russian)

Quantitative grammar of Russian prepositional constructions / V.P. Zakharov et al. URL: https://vintagentleman.github.io/qt_prep_gram/, (last accessed 24.11.2024).

Quantitative Ontology and Russian Preposition Database / V.P. Zakharov et al., Russian Foundation for Basic Research Journal. Humanities and social sciences. № 109. P. 17–26. 2022.

О.А. Mitrofanova & A.D. Moskvina, “On the Role of Prepositional Statistics for Genre Identification of Russian texts,” International Journal of Open Information Technologies. Vol. 8. № 11. P. 91–96. 2020.

D.V. Sichinava, “Ob odnom lingvisticheskom parametre tipologii tekstov: koe`fficient «pod/nad»,” Nauchno-texnicheskaya informaciya. Serija 2. № 10. P. 27–35. 2003.

D.A. Chernova, “Process obrabotki sintaksicheski neodnoznachnyh predlozhenij: psicholingvisticheskoe issledovanie,” Avtoref. na soick. uchenoj step. kand. filolog. nauk: 10.02.19 – teorija jazyka. SPb., 2016.

ai-forever/sbert_large_mt_nlu_ru | HuggingFace, URL: https://huggingface.co/ai-forever/sbert_large_mt_nlu_ru, (last accessed 24.11.2024).

I. Azarova, M. Khokhlova, V. Zakharov & V. Petkevič, “Ontological Description of Russian Prepositions,” Proceedings of the III International Conference on Language Engineering and Applied Linguistics, Saint Petersburg, Russia, November 27, 2019. P. 245–257. 2019.

P. Bojanowski, E. Grave, A. Joulin & T. Mikolov, “Enriching Word Vectors with Subword Information,” Transactions of the Association for Computational Linguistics. Vol. 5. P. 135–146. 2016.

F. Feng, Y. Yang, D.M. Cer, N. Arivazhagan & W. Wang, “Language-agnostic BERT Sentence Embedding,” Annual Meeting of the Association for Computational Linguistics. 2020.

H. Gong, J. Mu, S. Bhat & P. Viswanath, “Preposition sense disambiguation and representation,” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP, Brussels, Belgium. P. 1510–1521. 2018.

V. Gudkov, A. Golovina, O. Mitrofanova & V. Zakharov, “Russian Prepositional Phrase Semantic Labelling with Word Embedding-based Classifier,” A. Ronzhin, T. Noskova, A. Karpov (eds.) R. Piotrowski's Readings in Language Engineering and Applied Linguistics. PRLEAL-2019. CEUR Workshop Proceedings. Vol. 2552. P. 272–284. 2019.

M. V. Khokhlova & V. I. Rubiner, “On quantitative analysis of Russian prepositional constructions based on legislative texts,” Proceedings of the International Conference Corpus Linguistics-2019. Saint Petersburg, Russia. P. 149–154. 2019.

Y. Kuratov & M. Arkhipov, “Adaptation of Deep Bidirectional Multilingual Trans formers for Russian Language,” ArXiv preprint. URL: https://arxiv.org/abs/1905.07213. (last accessed 24.11.2024). 2019.

LaBSE-en-ru | HuggingFace, URL: https://huggingface.co/cointegrated/LaBSE-en-ru, (last accessed 24.11.2024).

K.C. Litkowski & O. Hargraves, “SemEval-2007 Task 06: Word-Sense Disambiguation of Prepositions,” Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic. P. 24–29. 2007.

V. Mikhailov, T. Shamardina, M. Ryabinin, A. Pestova, I. Smurov & E. Artemova, “RuCoLA: Russian Corpus of Linguistic Acceptability,” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates. P. 5207–5227. 2023.

T. Mikolov, K. Chen, G.S. Corrado & J. Dean, “Efficient Estimation of Word Representations in Vector Space,” International Conference on Learning Representations. 2013.

S. Pawar, S. Thombre, A. Mittal, G. Ponkiya & P. Bhattacharyya, “Tapping BERT for Preposition Sense Disambiguation,” ArXiv preprint. URL: https://arxiv.org/pdf/2111.13972 (last accessed 24.11.2024). 2021.

SentenceTransformers, URL: https://www.sbert.net/docs/training/overview.html, (last accessed 24.11.2024).

M. Straka & J. Straková, “Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe,” Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Vancouver, Canada. P. 88–99. 2017.

V. Zakharov & I. Azarova, “Grammatical Parallelism of Russian Prepositional Localization and Temporal Constructions,” Text, Speech, and Dialogue: 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8–11, 2020, Proceedings. P. 122–134. Springer-Verlag, 2020.

V. Zakharov, A. Golovina, E. Alexeeva & V. Gudkov, “Russian Secondary Prepositions: Methodology of Analysis,” CEUR Workshop Proceedings. Vol. 2780. P. 187–201. 2020.

D. Zmitrovich, A. Abramov, A. Kalmykov, M. Tikhonova, E. Taktasheva, D. Astafurov, M. Baushenko, A. Snegirev, T. Shavrina, S. Markov, V. Mikhailov & A. Fenogenova, “A Family of Pretrained Transformer Language Models for Russian,” ArXiv preprint, URL: https://arxiv.org/html/2309.10931v4. (last accessed 24.11.2024). 2024.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162