Building a text corpus for automatic biographical facts extraction from Russian texts
Abstract
Full Text:
PDF (Russian)References
Meyers A. Corpus Linguistics for NLP, New York University, URL: https://cs.nyu.edu/courses/spring18/CSCI-UA.0480-009/lecture7-corpus.pdf. Date of access: 14.06.2018.
Khokhlova M. A survey of Large Russian Corpora // Computer linguistics and computing ontologies. Proceedings of the XIX International Joint Scientific Conference. – Saint-Petersburg, 2016. – P. 74-77.
Khokhlova M. Large Corpora and Frequency Nouns // Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2016”. – Moscow, 2016. – P. 224-238.
Shang J. et al. Automated phrase mining from massive text corpora //IEEE Transactions on Knowledge and Data Engineering. – 2018.
Roll U., Correia R. A., Berger‐Tal O. Using machine learning to disentangle homonyms in large text corpora //Conservation Biology. – 2018. – Vol. 32. – №. 3. – P. 716-724.
Campillos L., Deléger L., Grouin C., Hamon T., Ligozat A.-L., Névéol A. A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annotated Text corpus (MERLOT) // Language Resources and Evaluation. – 2018. – Vol. 52(2). – P. 571-601.
Uhrig P., Evert S., Proisl T. Collocation Candidate Extraction from Dependency-Annotated Corpora: Exploring Differences across Parsers and Dependency Annotation Schemes //Lexical Collocation Analysis. – Springer, Cham, 2018. – P. 111-140.
Jia C. et al. Concept decompositions for short text clustering by identifying word communities //Pattern Recognition. – 2018. – Vol. 76. – P. 691-703.
Sameen S. et al. Measuring Short Text Reuse for the Urdu Language //IEEE Access. – 2018. – Vol. 6. – P. 7412-7421.
Sojka P., Líška M., Růžička M. Building Corpora of Technical Texts: Approaches and Tools // Fifth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN. – Brno, 2011. – P. 71-82.
LitvinovaT., Zagorovskaya O., Litvinova O. Russian text corpora for deception detection studies // International Journal of Open Information Technologies. – 2017. - Vol. 5, № 11. – P. 58-63.
Zevakhina N., DzhakupovaS. Russian metalinguistic comparatives: a functional perspective // Working papers by NRU HSE. Series WP BRP "Linguistics". – 2015. – № 39.
Open Corpora, URL: opencorpora.org. Date of access: 14.06.2018.
Rubtsova Yu. Constructing a corpus for sentiment classification training // Software & Systems. – 2014. – n Vol. 1. – P. 7-78.
Rezanova Z. Linguistic corpus "Tomsk regional text": concept and structure // Tomsk State University Journal of Philology. – 2015. – Vol. 1(33). – P. 38-50.
Rezanova Z., Vesnina G. Meta-data and annotation design of the Russian-speaking bilinguals speech subcorpus in the structure of the Tomsk Regional Corpus // Voprosy Leksikografii Russian Journal of Lexicography. – 2016. – Vol. 1(9). – P. 29-39. DOI: 10.17223/22274200/9/3.
Dracheva Yu. Electronic body of dialective texts in the aspect of studying the dynamics of cultural concepts (on the example of the multimedia case of Vologda texts) // Contemporary Russian lexicology, lexicography and linvogeography. – 2014. – P. 114-121.
Medvedeva E. Classification biographies as one of the biographics research methods in the context of library branch // Tomsk State University Journal of Cultural Studies and Art History. – 2016. – Vol. 2(22). – P. 198-205.
Wikipedia, URL: ru.wikipedia.org. Date of access: 17.03.2018.
da Costa dias Soares S.-F. Extraction of Biographical Information from Wikipedia Texts. – Lisbon, 2011.
Python 3.6.0., URL: https://www.python.org/downloads/release/python-360/. Date of access: 14.06.2018.
Wikipedia 1.4.0, URL: https://pypi.org/project/wikipedia/. Date of access: 14.06.2018.
.NET, URL: https://www.microsoft.com/net4. Date of access: 14.06.2018.
Zakharov V. Evaluation of Internet corpora of Russian // Proceedings of the International Conference “Corpus linguistics-2015”. – St. Petersburg, 2015. – P. 219–229.
Corpus of biographical texts, URL https://sites.google.com/site/utcorpus/. Date of access: 01.07.2018
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность IT Congress 2024
ISSN: 2307-8162