Data Mining in the Text Corpus on Corpus and Computational Linguistics
Abstract
Full Text:
PDF (Russian)References
O. A. Mitrofanova, and V. P. Zakharov, “Automatic Analysis of Terminology in the Russian Text Corpus on Corpus Linguistics,” in Computational Linguistics and Intellectual Technologies: Proceedings of the Annual International Conference "Dialogue 2009" (Bekasovo, May 27-31, 2009), issue. 8(15), Moscow, RSUH, pp. 321 – 328, 2009, URL: https://www.dialog-21.ru/digests/dialog2009/materials/pdf/49.pdf (accessed date: 25.11.2024).
N. V. Vinogradova, and O. A. Mitrofanova, “Formal Ontology as a Tool for Systematizing Data in the Russian Text Corpus on Corpus Linguistics,” in Proceedings of the International Conference "Corpus Linguistics - 2008", St. Petersburg, 2008, URL: https://project.phil.spbu.ru/corpora2011/Works2008/MitrofanovaVinogradova_113_121.pdf (date of access: 25.11.2024).
N. V. Vinogradova, O. A. Mitrofanova, and P. V. Panicheva, “Automatic Classification of Terms in the Russian Text Corpus on Corpus Linguistics,” in Proceedings of the Ninth All-Russian Scientific Conference "Electronic Libraries: Advanced Methods and Technologies, Electronic Collections" (RCDL-2007), Pereslavl-Zalessky, 2007, URL: http://rcdl.ru/doc/2007/paper_31_v1.pdf (date of access: 25.11.2024).
V. P. Zakharov, and S.Yu. Bogdanova, “Corpus Linguistics”, St. Petersburg, 2020.
E. V. Tikhonova, and M. A. Kosycheva, “Effective Keyword(s): Formulation Strategies,” Health, Food & Biotechnology, issue 3(4), pp. 7–15, 2022, URL: https://elibrary.ru/item.asp?id=49446588 (accessed date: 25.11.2024).
O. Kamshilova, L. Beliaeva, and L. Geikhman, “Author’s Choice for Keyword List: Research Aspect,” in R. Piotrowski's Readings in Language Engineering and Applied Linguistics, Proceedings of the III International Conference on Language Engineering and Applied Linguistics (PRLEAL–2019), CEUR Workshop Proceedings, Saint Petersburg, Russia, November 27, 2019, pp. 47–59, 2020, URL: https://elibrary.ru/item.asp?id=42584043 (accessed date: 25.11.2024).
O. A. Mitrofanova, and D. A. Gavrilik, “Experiments on Automatic Extraction of Key Expressions in Stylistically Diverse Corpora of Russian Text Corpora,” Terra Linguistica, issue 13(4), pp. 22–40, 2022, URL: https://elib.spbstu.ru/dl/2/j23-158.pdf/en/info (accessed date: 25.11.2024).
D. D. Guseva, and O. A. Mitrofanova, “Key Expressions in Russian Popular Science Texts: Comparison of Oral and Written Speech Perception with the Results of Automatic Analysis,” Terra Linguistica, issue 15(1), pp. 20–35, 2024.
A. Moskvina, E. Sokolova, and O. Mitrofanova, “KeyPhrase Extraction from the Russian Corpus on Linguistics by means of KEA and RAKE Algorithm,” in Data Analytics and Management in Data Intensive Domains: XX International Conference DAMDID/RCDL’2018, October 9–12, 2018, Moscow, Russia, Conference Proceedings, ed. by L. Kalinichenko, Y. Manolopoulos, S. Stupnikov, N. Skvortsov, and V. Sukhomlin, FRC CSC RАS, pp. 369 – 372, 2018, URL: https://elibrary.ru/item.asp?id=41112843 (accessed date: 25.11.2024).
D. A. Morozov, et al., “Generation of Keywords for Abstracts of Russian Scientific Articles,” Morozov D.A., Glazkova A.V., Tyutulnikov M.A., Iomdin B.L., Bulletin of NSU. Series: Linguistics and intercultural communication, no. 1, 2023.
A. Aries, D. Zegour, and H. Walid, “Automatic Text Summarization: What has been done and what has to be done,” arXiv:1904.00688, pp. 1–34, 2019, URL: https://arxiv.org/abs/1904.00688 (accessed date: 25.11.2024).
A. Nenkova, and K. McKeown, “Automatic Summarization,” Foundations and Trends in Information Retrieval, vol. 5(2-3), pp. 103–233, 2011, URL: https://core.ac.uk/download/pdf/76383212.pdf (accessed date: 25.11.2024).
M. Allahyari, et al., “Text Summarization Techniques: a Brief Survey,” Allahyari M., Pouriyeh S., ssefi M., Safaei S., Trippe E.D., Gutierrez J.B., and Kochut K., arXiv preprint, 2017, URL: https://arxiv.org/abs/1707.02268 (accessed date: 25.11.2024).
M. Athugodage, O. Mitrofanovа, and V. Gudkov, “Transfer Learning for Russian Legal Text Simplification,” in Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024, pp. 59–69, 2024, URL: https://aclanthology.org/2024.readi-1.6/ (accessed date: 25.11.2024).
V. Gudkov, O. Mitrofanova, and E. Filippskikh, “Automatically Ranked Russian Paraphrase Corpus for Text Generation,” in Proceedings of the Fourth Workshop on Neural Generation and Translation. Association for Computational Linguistics, pp. 54–59, 2020, URL: https://aclanthology.org/2020.ngt-1.6/ (accessed date: 25.11.2024).
J. Pilault, et al., “On Extractive and Abstractive Neural Document Summarization with Transformer Language Models,” Pilault J., Li R., Subramanian S., and Pal C., in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp. 9308–9319, 2020, URL: https://aclanthology.org/2020.emnlp-main.748/ (accessed date: 25.11.2024).
Automatic Text Summarizer, URL: https://pypi.org/project/sumy/ (accessed date: 25.11.2024).
RuT5SumGazeta, URL: https://huggingface.co/IlyaGusev/rut5_base_sum_gazeta (accessed date: 25.11.2024
M. M. Tikhomirov, N. V. Loukachevitch, and B. V. Dobrov, “Recognizing Named Entities in Specific Domain,” Lobachevskii Journal of Mathematics, vol. 41(8), pp. 1591–1602, 2020, doi: 10.1134/S199508022008020X.
D. M. Kostyuk, and N. K. Shirokov, “Methods for Identifying Named Entities in the Tasks of Processing the Flow of Scientific News,” in Management of University Libraries, Minsk, pp. 50–54, 2021, URL: https://elibrary.ru/item.asp?id=49171334 (accessed date: 25.11.2024).
A. A. Navrotsky, and E. V. Krivaltsevich, “Comparative Analysis of Systems for Extracting Named Entities from Unstructured Journalistic Texts,” in BIG DATA and Advanced Analytics = BIG DATA and high-level analysis, Minsk, pp. 12–18, 2020, URL: https://elibrary.ru/item.asp?id=43934323 (accessed date: 25.11.2024).
V. Yadav, and S. Bethard, “A Survey on Recent Advances in Named Entity Recognition from Deep Learning Models,” in Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, Association for Computational Linguistics, pp. 2145–2158, 2018, URL: https://arxiv.org/abs/1910.11470 (accessed date: 25.11.2024).
Natasha, GitHub Repository, URL: https://github.com/natasha/natasha (accessed date: 02.02.2024).
Yargy, GitHub Repository, URL: https://github.com/natasha/yargy (accessed date: 25.11.2024).
Named Entity Recognition (NER), DeepPavlov, URL: https://docs.deeppavlov.ai/en/master/features/models/NER.html (accessed date: 25.11.2024).
NEREL, GitHub Repository, URL: https://github.com/nerel-ds/NEREL (accessed date: 25.11.2024).
Stanford NER, URL: https://www.davidsbatista.net/blog/2018/01/23/StanfordNER/ (accessed date: 25.11.2024).
K. V. Vorontsov, “Probabilistic Topic Modeling: ARTM Regularization Theory and the BigARTM Open Source Library,” URSS, 2023.
A. Moskvina, E. Sokolova, and O. Mitrofanova, “KeyPhrase Extraction from the Russian Corpus on Linguistics by Means of KEA and RAKE Algorithm,” in Data Analytics and Management in Data Intensive Domains: XX International Conference DAMDID/RCDL’2018, October 9–12, 2018, Moscow, Russia, FRC CSC RAS, pp. 369–372.
D. Mimno, H. Wallach, E. Talley, M. Leenders, and A. McCallum, “Optimizing Semantic Coherence in Topic Models,” in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 262−272, 2011.
G. Heinrich, “Parameter Estimation for Text Analysis,” Technical Report, pp. 1–32, 2005.
S. Koltcov, “Application of Rényi and Tsallis Entropies to Topic Modeling Optimization,” Physica A: Statistical Mechanics and its Applications, no. 512, pp. 1192–1204, 2018.
A. Erofeeva, and O. Mitrofanova, “Automatic Assignment of Labels in Topic Modeling for Russian Corpora,” Structural and Applied Linguistics, vol. 12, pp. 122–147, 2019.
A. Kriukova, A. Erofeeva, O. Mitrofanova, and K. Sukharev, “Explicit Semantic Analysis as a Means for Topic Labeling,” in Artificial Intelligence and Natural Language Processing: 7th International Conference, AINL 2018, St. Petersburg, Russia, October 17–19, 2018, Proceedings. Springer, Cham, pp. 167–177, 2018.
O. Mitrofanova, A. Kriukova, V. Shulginov, and V. Shulginov, “E-hypertext Media Topic Model with Automatic Label Assignment,” in Recent Trends in Analysis of Images, Social Networks and Texts: 9th International Conference, AIST 2020, Revised Supplementary Proceeding, Communications in Computer and Information Science, vol. 1357, Springer, pp. 102−114, 2021.
O. A. Mitrofanova, M. M. Athugodage, and L. V. Ten, “Topic Label Generation in the Popular Science Corpus,” in 26th international conference «Internet and Modern Society» (IMS–2023), International Workshop «Computational Linguistics» (CompLing 2023), Proceedings, Springer Nature, 2023.
T. Sherstinova, O. Mitrofanova, T. Skrebtsova, E. Zamiraylova, and M. Kirina, “Topic Modelling with NMF vs Expert Topic Annotation: The Case Study of Russian Fiction,” in Advances in Computational Intelligence: 19th Mexican International Conference on Artificial Intelligence, MICAI 2020, vol. 12469, pt. 2, P. 134–152, 2020.
D. Kuang, J. Choo, and H. Park, “Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering,” Partitional clustering algorithms, pp. 215–243, 2015.
Scikit-Learn, URL: https://scikit-learn.org/ (accessed date: 25.11.2024).
T. K. Landauer, P. W. Foltz, and D. Laham, “Introduction to Latent Semantic Analysis,” Discourse Processes, issue 25, pp. 259–284, 1998.
A. V. Chizhik, “Using Topic Modeling Methods to Assess the Degree of Media Influence on Public Mood,” in Computational Linguistics and Computational Ontologies, issue 5, Proceedings of the XXIV International United Scientific Conference "Internet and Modern Society", IMS-2021, St. Petersburg, June 24–26, 2021, SPb., ITMO University, pp. 70–78, 2021.
M. A. Kirina, “Comparison of Topic Models Based on LDA, STM, and NMF for Qualitative Analysis of Russian Short Fiction,” Bulletin of the Novosibirsk State University. Series: Linguistics and Intercultural Communication, no. 20(2), pp. 93–109, 2022.
D. M. Blei, A.Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” University of California, Berkeley, Berkeley, CA 94720, pp. 993–1022, 2002.
T. Hofmann, “Probabilistic Latent Semantic Indexing,” ACM SIGIR Forum, vol. 51,2, pp. 211–218, 2017.
Gensim, URL: https://radimrehurek.com/gensim/ (accessed date: 25.11.2024).
X. Yan, J. Guo, Y. Lan, and X. Cheng, “A Biterm Topic Model for Short Texts,” in WWW 2013. Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456, 2013.
Biterm, URL: https://pypi.org/project/biterm/ (accessed date: 25.11.2024).
Google, URL: https://www.google.ru/ (accessed date: 25.11.2024).
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” URL: https://arxiv.org/abs/1301.3781 (accessed date: 25.11.2024).
O. A. Mitrofanova, “Search and Ranking of Texts in a Special Corpus based on Topic Modeling,” in Proceedings of the International Conference "Corpus Linguistics - 2023" (SPb Corpora 2023), june 21-23, 2023, St. Petersburg, SPb., 2024.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность IT Congress 2024
ISSN: 2307-8162