Sketch Engine and TermoStat tools for automatic term extraction

A. A. Novikova

Abstract


The paper considers comparison of Sketch Engine and TermoStat tools in automatic term extraction for a small English-language corpus. The corpus includes technical texts of “Water supply” subject domain. Nowadays there are a lot of specialized programme tools for term extraction purposes. These tools also allow to work with large text data. The methods of corpus linguistics are often used in natural language processing. Text corpus is a collection of texts with specific genre. It could represent a so-called “language model” in general and is often used as a basis for language research. There are many special programme tools which allow to create different corpora. Different specific linguistic problems could be solved using text corpora, especially term extraction which is very important task for linguists, lexicographers and terminologists to solve. It is very important to correct and improve terminological standarts, glossaries, dictionaries, etc. because of new terms which appear permanently. It is also important for stable work of online-dictionaries and machine translation systems, and it also helps translation specialists. The results of using term extraction tools with small English text corpus of water supply subject domain are discussed in the paper.

Full Text:

PDF (Russian)

References


DOI: 10.25559/INJOIT.2307-8162.08.202011.73-79

Gavrilova I. A. K voprosu opredelenija sushhnosti termina (na materiale anglijskoj poligraficheskoj terminologii) // V mire nauki i iskusstva: voprosy filologii, iskusstvovedenija i kul'turologii: sb. st. po mater. V mezhdunar. nauch.-prakt. konf. – Novosibirsk: SibAK, 2011. [Online]. URL: https://sibac.info/conf/philolog/v/27647 (request date: 20.09.2020).

Gerd A. S. Prikladnaja lingvistika [Applied linguistics]. Saint-Petersburg: izd. SPbGU Publ., 2005. 267 p.

Grinev C. B. Vvedenie v terminovedenie. – M.: Moskovskij licej, 1993. – 309 p.

Grinev-Grinevich, S.V. Terminovedenie: Ucheb. posobie. – M.: Akademiya, 2008. – 303 p.

GOST 30813-200. Voda i vodopodgotovka. Terminy i opredeleniya. [Online]. URL: https://files.stroyinf.ru/Data2/1/4294817/4294817020.htm (Request date: 20.09.2020).

GOST 7.0-99. Sistema standartov po informacii, bibliotechnomu i izdatel'skomu delu. Informacionno-bibliotechnaya deyatel'nost', bibliografiya. Terminy i opredeleniya. [Online]. URL: http://docs.cntd.ru/document/gost-7-0-99 (Request date: 20.09.2020).

Zaкharov, V. P., Khokhlova, M. V. (2014). Avtomaticheskoe vyjavlenie terminologicheskih slovosochetanij. Strukturnaja i prikladnaja lingvistika, (Vyp.10), 182–200.

Zakharov V. P. (2015). Korpusno-orientirovannji podhod k postrojeniju tezaurusov i ontologij [Corpus-based approach to thesaurus and ontology construction]. Strukturnaja i prikladnaja lingvistika. Vip. 11. SPb. 2015. P. 123-141.

Lejchik V.M. Terminovedenie: predmet, metody, struktura. – M.: Izd-vo LKI, 2007. – 256 p.

Mitrofanova O.A., Zakharov V.P. Avtomatizirovannyj analiz terminologii v russkojazychnom korpuse tekstov po korpusnoj lingvistike // Komp'juternaja lingvistika i intellektual'nye tehnologii: Po materialam ezhegodnoj Mezhdunarodnoj konferencii «Dialog 2009». – M.: RGGU, 2009. – S. 321–328. – [Online]. URL: http://www.dialog-21.ru/dialog2009/materials/pdf/49.pdf (request date: 20.08.2020).

Solov'eva A. E. Terminology of military helicopter aviation as an object of linguistic study (by the example of the English, Russian and Turkish languages) // Philology. Theory and Practice. – Tambov: Gramota, 2018. №4(82). Vol. 1. P. 172-176. [Online]. URL: https://www.gramota.net/materials/2/2018/4-1/40.html (request date: 20.09.2020).

Superanskaya A. V., Podol'skaya N. V., Vasil'eva N. V. Obshchaya terminologiya: Voprosy teorii. M.: LIBROKOM, 2012. — 248 p.

Khokhlova M. V. Sopostavitel'nyj analiz statisticheskih mer na primere chasterechnyh preferencij sochetaemosti sushchestvitel'nyh // Komp'yuternaya lingvistika i vychislitel'nye ontologii. Vypusk 1 (Trudy XX Mezhdunarodnoj ob"edinennoj konferencii «Internet i sovremennoe obshchestvo, IMS-2017, Sankt-Peterburg, 21 - 23 iyunya 2017 g. Sbornik nauchnyh statej). SPb: Universitet ITMO, 2017. S. 165-174.

Cabré, M. T., Estopà, R., Vivaldi, J. Automatic term detection: a review of current systems // Bourigault, D.; Jacquemin, C.; L’Homme, M-C. (2001) Recent Advances in Computational Terminology, p. 53-88.

Drouin, P. (2003). Term extraction using non-technical corpora as a point of leverage. Terminology, 1(9):99– 115.

Kilgariff A., Jakubíček M., Kovář V., Rychlý P., Suchomel V. Finding Terms in Corpora for Many Languages with the Sketch Engine // Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics. – Goethenburg, 2014. – [Online]. URL: https://www.sketchengine.co.uk/wpcontent/uploads/Finding_Terms_2014.pdf (request date: 20.09.2020).

Rychlý P. A Lexicographer-Friendly Association Score / P. Rychlý // Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2008, Brno, Masaryk University, 2008. Pp. 6–9. [Online]. URL: https://nlp.fi.muni.cz/raslan/2008/papers/13.pdf (request date: 20.09.2020).

Sketch Engine. [Online]. URL: https://www.sketchengine.eu/ (request date: 20.09.2020).

Statistics Used in the Sketch Engine. Lexical Computing Ltd., 2015. – [Online]. URL: https://www.sketchengine.co.uk/wp-content/uploads/ske-stat.pdf (request date: 20.08.2020).

TermoStat. [Online]. URL: http://termostat.ling.umontreal.ca/index.php (request date: 20.09.2020).


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162