Towards a part-of-speech tagger for Sranan Tongo

Nicolás Cortegoso Vissio, Viktor Zakharov

Abstract


This paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021 [1]. On that occasion, a rule-based stochastic hybrid part-of-speech tagger (POS) was introduced for Sranan Tongo, a Creole language from South America with around half a million speakers. Since Sranan Tongo does not have a written corpus and text annotation is an expensive and time-consuming task, it was proposed to take a first step in training a POS tagger using only 550 hand-annotated sentences with part of speech tags.

In this new contribution, the development of the POS tagger for Sranan Tongo goes a step further with the addition of more training data. For this matter, the tagger was used to annotate 2,406 sentences. The tagging results were hand-corrected and employed to retrain the model. A comparison is shown between the performance of the POS tagger on three texts before and after the inclusion of the new training data.


Full Text:

PDF

References


Cortegoso Vissio N., Zakharov V. A rule-stochastic hybrid POS-tagger for Sranan Tongo with minimal lexicon and training dataset. In: Proceedings of the International Conference «Corpus Linguistics-2021». Saint-Petersburg, Sofia-Press. 2021 (in print).

Radke H. Niederländisch und Sranantongo in Surinamischer Onlinekommunikation // Taal en Tongval. University Press, Amsterdam, 2017. Vol. 69. P. 113-136.

Sebba M. Contact languages: pidgins and creoles. Palgrave Macmillan, 1997.

Wortubuku fu Sranan Tongo. SIL International. URL: https://www.sil.org/resources/archives/13426 (accessed: 10.10.2021).

Yakpo K., Bruyn A. Transatlantic patterns: The relexification of locative constructions in Sranan // Surviving the Middle Passage: The West Africa-Surinam Sprachbund / Pieter Muysken, Norval Smith (Eds.). De Gruyter Mouton, Berlin, 2015. P. 135–175.

Wilner J. Wortubuku fu Sranan Tongo. Sranan Tongo-English Dictionary / John Wilner (ed.), Ronald Pinas, Lucien Donk, Hertoch Linger Arnie Lo-Ning-Hing, Tieneke MacBean, Celita Zebeda-Bendt, Chiquita Pawironadi-Nunez, Dorothy Wong Loi Sing. SIL International, 2007.5th ed.

Wilner J. Wortubuku fu Sranan Tongo. Sranan Tongo-Nederlands Woordenboek / John Wilner (ed.), Ronald Pinas, Lucien Donk, Hertoch Linger Arnie Lo-Ning-Hing, Tieneke MacBean, Celita Zebeda-Bendt, Chiquita Pawironadi-Nunez, Dorothy Wong Loi Sing. SIL International, 2007. 5th ed.

Nickel M., Wilner J. Papers on Sranan Tongo. Summer Institute of Linguistics, 1984. URL: https://archive.org/details/rosettaproject_srn_morsyn-1 (accessed: 05.04.2021).

Winford D., Plag I. Sranan structure dataset // Atlas of Pidgin and Creole Language Structures Online / Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, Huber Magnus (Eds.). Leipzig: Max Planck Institute for Evolutionary Anthropology, 2013. URL: http://apics-online.info/contributions/2 (accessed: 03.10.2021).

Jurafsky D. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition / Daniel Jurafsky, James H. Martin. Prentice Hall, New Jersey, 2008. 2nd edition.

Rijksoverheid. Skowtu hori yu na ini a tori fu wan ordru fu den bakrakondre, nanga den tyari yu na skowt’oso noso wan tra presi pe den o yere yu. URL: https://www.rijksoverheid.nl/documenten/brochures/2014/07/01/u-bent-aangehouden-in-verband-met-een-europees-aanhoudingsbevel-en-meegenomen-naar-het-politiebureau-of-andere-verhoorlocatie.-wat-zijn-uw-rechten-sranan-tongo (accessed: 10.10.2021).

MacBean G. A gridi frow fu fisman Albert. Institut voor Taalwetenschap (SIL). 1993. URL: http://suriname-languages.sil.org/Sranan/English/SrananEngLLIndex.html (accessed: 10.10.2021).

Pinas E. San pesa ini Kaneri. Nieuwe Surinaamse Verhalen. Nieuwe Surinaamse verhalen. M. van Kempen (comp.). Uitgeverij De Volksboekwinkel, Paramaribo. 1986.

Cortegoso Vissio N. A part of speech tagger for Sranan Tongo based on a Trigram Hidden Markov Model // GitHub repository. URL: https://github.com/nicolascortegoso/HMM-for-sranantongo (accessed: 10.10.2021).


Refbacks

  • There are currently no refbacks.


Abava  Absolutech Convergent 2020

ISSN: 2307-8162