Measuring Similarity of Fiction Texts Based on Distributional Semantic Models (Case Study of the Russian Original Text and English Translations of M.Bulgakov's Novel "The Master and Margarita")
Abstract
Full Text:
PDFReferences
K. Church, P. Hanks, “Word Association Norms, Mutual Information, and Lexicography”. Computational Linguistics, vol. 16, issue 1, 1990, pp. 22–29.
F. Smadja, “Retrieving collocations from text: Xtract”. Computational Linguistics - Special issue on using large corpora, vol. 19, issue 1, 1993, pp. 143–177.
D. Lin, “Using collocation statistics in information extraction”. Proceedings of the Seventh Message Understanding Conference (MUC-7), 1998.
St. Evert, “Corpora and Collocations”. Corpus Linguistics. An International Handbook / A. Lüdeling, M. Kytö (eds.), 2008, article 58, pp. 1212–1248.
V.Seretan, Syntax-Based Collocation Extraction. Text, Speech and Language Technology series, vol. 44, 2011.
L. Wanner, B. Bohnet, M. Giereth, “Making sense of collocations”. Computer, Speech and Language, vol. 20, issue 4, 2006, pp. 609–624.
J. Kupiec, “An algorithm for finding noun phrase correspondences in bilingual corpora”. Proceedings of the 31st annual meeting on association for computational linguistics (ACL 1993), 1993, pp. 17–22.
M. Haruno, S. Ikehara, T. Yamazaki, “Learning bilingual collocations by word-level sorting”. Proceedings of the 16th Conference on Computational Linguistics, vol. 1, 1996, pp. 525–530.
Ch.-Ch. Wu, J.S. Chang, “Bilingual collocation extraction based on syntactic and statistical analyses”. Proceedings of the 15th Conference on Computational Linguistics and Speech Processing. Association for Computational Linguistics and Chinese Language Processing, 2003, pp. 1–20.
P. Fung, “A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora”. Proceedings of the 3rd Conference of the Association for Machine Translation in the America. Machine Translation and the Information Soup (AMTA 1998), 1998, pp. 1–17.
G. Bukia, E. Protopopova, O. Mitrofanova, “A corpus-driven estimation of association strength in lexical constructions”. Proceedings of the AINL-ISMW FRUCT, FRUCT Oy, Finland, 2015, pp. 147–152.
T. Mikolov, K. Chen, G. Corrado, J. Dean, “Efficient estimation of word representations in vector space”. Workshop Proceedings of the International Conference on Learning Representations (ICLR), 2013.
Yu. Morozova, E. Kozerenko, M. Sharnin, “Method for extracting single-word translation correspondences from parallel texts using distributional semantics models”. Systems and Means of Informatics, vol. 24., issue 2, 2014, pp. 131–142. (In Rus.) = Yu. Morozova, E.
Kozerenko, M. Sharnin, “Metodika izvlechenija poslovnyh perevodnyh sootvetstvij iz parallelnyh tekstov s primenenijem modelej distributivnoj semantiki”. Sistemy i sredstva informatiki, tom 24, vyp. 2, 2014. Pp. 131–142.
O. Vācietis, Ieejam Bulgakova galaktikā. Jaunās grāmatas, № 11, 1979. (In Lat.) = O. Vācietis, Vhodim v galaktiku Bulgakova. Novyje knigy, № 11, 1979.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, “Distributed representations of words and phrases and their compositionality”. NIPS’13 Proceedings of the 26th International Conference of Neural Information Processing Systems, 2013.
Y. Bengio, “A Neural Probabilistic Language Model”. Journal of Machine Learning Research 3, 2003, pp. 1137–1155.
Melchuk, The experience of the theory of linguistic models “Meaning <=> Text”. M., 1999. (In Rus.) = I. Melchuk, Opyt teorii lingvisticheskih modeley “Smysl<=> Tekst”. M., 1999.
V. Komissarov, Theory of translation. M., 1990. (In Rus.) = V. Komissarov, Teorija perevoda. M., 1990.
V. Komissarov, Modern translation science. M., 2004. (In Rus.) = V. Komissarov, Sovremennoje perevodovedenije. M., 2004.
DocSim: the code for the cosine measure. URL: https://github.com/v1shwa/document-similarity
Google News Corpus. URL: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit.
LF Aligner. URL: https://sourceforge.net/projects/aligner/
Microsoft Research Paraphrase Corpus. URL: https://www.microsoft.com/enus/download/details.aspx?id=52398&from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fdownloads%2F607d14d9-20cd-47e3-85bc-a2f65cd28042%2Fdefault.aspx
ParaPhraser. URL: http://paraphraser.ru/
ParaPlag. URL: https://plagevalrus.github.io/content/corpora/paraplag.html
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность IT Congress 2024
ISSN: 2307-8162