Creating a corpus of Russian texts with markup of syntactic group systems

Dmitriy V. Demidov, Igor V. Evchenko

Abstract


The paper analyzes the ways of representing syntactic structure of a sentence. The disadvantages of formalisms and major problems in the construction of sentence’ syntactic structure are highlighted. The technology of forming a syntactically tagged corpus of Russian texts is described, in which the syntactic structure of sentences is represented by the tagged system of syntactic groups (SSG) of Gladky A.V. SynTagRus serves as a source material for the corpus. The paper describes a method of using the CoNLL-U format for representing SSGs and examples of rules for selecting syntactic groups based on dependency trees. Software tool for transformation of dependency trees into SSGs and tool for visualizing SSGs are presented. The results obtained make it possible to create a syntactic parser that builds SSGs using machine learning methods not excluding the applicability of the traditional approach.


Full Text:

PDF (Russian)

References


Gladky A.V. Syntactic structures of natural language. Ed. 3rd, stereotype. M.: LENNAND. 2018. 152 p.

Droganova K., Zeman D. Conversion of SynTagRus (the Russian dependency treebank) to Universal Dependencies // Technical report. — Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, 2016.

Inshakova E., Iomdin L., Mityushin L., Sizov V., Frolova T., Tsinman L. SinTagRus today. Proceedings of the V.V. Vinogradov Russian Language Institute. 2019.

Vlasova N.A., Trofimov I.V., Serdyuk Yu.P., Suleymanova E.A., Vozdvizhensky I.N. PaRuS - syntactically annotated corpus of the Russian language // Software systems: theory and applications. 2019. No. 4 (43).

Kibrik, Andrej A., Dobrov, Grigory B., & Korotaev, Nikolay A. Modeling natural communication and a multichannel resource: The deceleration effect // V. Solovyev, N. Loukachevitch, O. Lyashevskaya (Eds.) Proceedings of the Linguistic Forum 2020: Language and Artificial Intelligence. Moscow, Russia, November 12-14, 2020. 2021.

Demidov D.V. Representation of syntactic structures with coordinating constructions // Artificial intelligence and decision making. No. 2. 2022. pp. 36-50.

Korotaev N. A. Syntactic groups of A. V. Gladky: analysis of constructions with an essay // Bulletin of the Russian State University for the Humanities. Series: Literary Studies. Linguistics. Culturology. 2013. No. 8 (109).

Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 1993, PP. 313–330.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162