Comparison of neural network architectures for recognizing Russian speech with a foreign accent

A.A. Volkova; E.V. Druzhinskaya

Comparison of neural network architectures for recognizing Russian speech with a foreign accent

A.A. Volkova, E.V. Druzhinskaya

Abstract

Automatic speech recognition systems are actively developing thanks to the widespread use of deep learning methods. However, when working with Russian speech pronounced by speakers of other languages, most algorithms encounter a decrease in accuracy. Accent specificity affects the duration of sounds, articulatory transitions, and prosodic characteristics, which makes it difficult to correctly identify acoustic features and subsequently transcribe them. The scientific literature describes many approaches to speech processing, including deep, convolutional, and recurrent models, as well as hybrid architectures that use attention mechanisms. Each of them responds differently to pronunciation variability and accent intensity. Taking these features into account, this work aims to study the behavior of various neural network architectures in recognizing Russian speech with a foreign accent and to analyze their results on a corpus of our own recordings. The evaluation is based on WER, CER, and Accuracy metrics, which allows us to identify models that demonstrate the greatest resistance to accent distortions and are capable of working with limited and heterogeneous data.

Full Text:

PDF (Russian)

References

A.A. Volkova, E.V. Druzhinskaya Review of models of devices for recognizing accented speech // Science of the present and the future: materials of the XII scientific and practical conference of students, graduate students and young scientists, May 15–17, 2025. — St. Petersburg: ETU “LETI”, 2025. — Vol. 1. — P. 22–25.

Deep Neural Network (DNN) Explained // Medium [Online]. - 2024. - URL: https://medium.com/@zomev/deep-neural-network-dnn-explained-0f7311a0e869 4 (accessed: 18.11.2025)

Shishkin, A.G. Methods of Digital Speech Processing and Recognition: Monograph / A.G. Shishkin. - Moscow: INFRA-M, 2024. - 347 p.

S. Kostadinov Recurrent Neural Networks with Python Quick Start Guide. -Birmingham: Packt Publishing, 2018. - 122 p.

Purwins H., Li B., Virtanen T., Schlüter J., Chang S., Sainath T. Deep Learning for Audio Signal Processing // IEEE Journal of Selected Topics of Signal Processing. - 2019. - Vol. 13. - No. 2. - p. 206–219.

Tampel I.B., Karpov A.A. Automatic speech recognition. Textbook. - St. Petersburg: ITMO University, 2016. - 138 p.

Patil S. Stacked RNNs in NLP // Artificial Intelligence in Plain English : [Online]. – 2023. – URL: https://python.plainenglish.io/stacked-rnns-in-nlp-936e6eecf37a (accessed: 18.11.2025).

Khan S., Naseer M., Hayat M., Zamir S.W., Khan F.S., Shah M. Transformers in Vision: A Survey // ACM Computing Surveys. - 2022. - Vol. 54. - No. 10. - p. 1–41.

How speech recognition systems work // Amvera [Online]. - 2022. - URL: https://amvera.ru/howasrwork (accessed: 18.11.2025).

Varunanantharasa P. Building a Handwriting Recognition System with CRNN: A Beginner’s Guide // Medium : [Online]. – 2025. – URL: https://medium.com/@pavitharan2020/building-a-handwriting-recognition-system-with-crnn-a-beginners-guide-58a51a46dd15 (accessed: 18.11.2025).

Kheddar H., Hemis M., Himeur Y. Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey // Information Fusion. – 2024. – Vol. 109. – No. 102422.

Kamath U., Liu J., Whitaker J. Deep Learning for NLP and Speech Recognition. – Cham : Springer, 2019. – 621 p

Refbacks

There are currently no refbacks.

Abava Кибербезопасность Monetec 2026 СНЭ

ISSN: 2307-8162

International Journal of Open Information Technologies