Algorithm of fuzzy text search in online social networks

Yulia Davydova

Abstract


In the task of online social networks monitoring search with keywords is complicated by misspellings, typos, slang in users’ posts. To reduce search sensitivity to misspellings and improve the completeness of search results it is proposed to use fuzzy search with filtration. This article presents the algorithm consisting of two stages – scanning and verification. On the scanning stage, text is being filtered with the aim to exclude posts, which definitely do not contain keywords from consideration. Remaining post are checked on the verification stage. Integration of linguistic rules and misspellings statistics in text search allows to preserve its accuracy. The article presents estimation of effectiveness of the whole algorithm of fuzzy search and of the classifier used in it in particularly. Testing was done on the sample of posts from The General Internet-Corpus of Russian.

Full Text:

PDF (Russian)

References


Brand Analytics – sistema monitoringa social'nyh media i SMI https://br-analytics.ru/ Retrieved: 20.02.2018.

Youscan – sistema monitoringa social'nyh media i social'nyh setej https://youscan.io/ Retrieved: 20.02.2018.

Sovremennyj russkij jazyk v Internete / pod red. Ya. Je. Ahapkina, E.V. Rahilina. – M.: Jazyki slavjanskoj kul'tury, 2014. – 328 s.

Davydova Yu. V. Problema obrabotki oshibok v tekstah soobshhenij pol'zovatelej v zadache monitoringa virtual'nyh social'nyh setej // Novye informacionnye tehnologii i sistemy: materialy XIV Mezhdunarodnoj nauchno-tehnicheskoj konferencii. – Penza, 2017. – s. 342-345.

Gubanov D. A., Chhartishvili A. G. Konceptual'nyj podhod k analizu social'nyh setej // Upravlenie bol'shimi sistemami: sbornik trudov. – 2013. – № 45. – s. 226-236.

Batrinca B., Treleaven P. C. Social media analytics: a survey of techniques, tools and platforms // AI & Society. – 2015. – Vol. 30, No. 1. – pp. 89-116.

Davydova Yu. V. Model' oshibok dlja nechetkogo tekstovogo poiska v zadache monitoringa virtual'nyh social'nyh setej dlja obespechenija informacionno-psihologicheskoj bezopasnosti lichnosti // Sovremennye informacionnye tehnologii i IT-obrazovanie. – 2017. – Т. 13, № 3. – s. 72-82.

Belikov V., Kopylov N., Piperski A., Selegey V., Sharoff S. Big and diverse is beautiful: A large corpus of Russian to study linguistic variation // Proceedings of the 8th Web as Corpus Workshop (WaC-8), 2013. – pp. 24-28.

General'nyj Internet-korpus russkogo jazyka http://www.webcorpora.ru/ Retrieved: 25.02.2018.

Manning C. D., Raghavan P., Schutze H. Introduction to information retrieval. – Cambridge: Cambridge University Press, 2008. – 496 p.

Panina M. F., Bajtin A. V., Galinskaja I. E. Avtomaticheskoe ispravlenie opechatok v poiskovyh zaprosah bez ucheta konteksta // Komp'juternaja lingvistika i intellektual'nye tehnologii: materialy ezhegodnoj Mezhdunarodnoj konferencii «Dialog», 2013. – s. 568-579.

Social'nye seti v Rossii, leto 2017: cifry i trendy http://blog.br-analytics.ru/sotsialnye-seti-v-rossii-leto-2017-tsifry-i-trendy/ Retrieved: 22.02.2018.

Navarro G., Raffinot M. Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences. – Cambridge: Cambridge University Press, 2007. – p. 232.

Navarro G. A guided tour to approximate string matching // ACM Surveys. – 2001. – Vol. 33, No. 1. – pp. 33-88.

Savva Yu. B., Davydova Yu. V. Linguistic database for monitoring system of online social networks in providing information and psychological security // European integration: justice, freedom and security: proceedings of VII scientific and professional conference with international participation: in 3 volumes. – Belgrade: “Criminalistic-Police Academy” Publisher, 2016. – Vol. 1. – pp. 145-154.

Chastoty slovoform i slovosochetanij http://www.ruscorpora.ru/corpora-freq.html Retrieved: 2.03.2018.

Ingersoll G.S., Morton T.S., Farris L.A. Taming text. How to find, organize and manipulate it. – NY: Manning Publications Co., 2013. – 320 p.

Ul'man Dzh., Radzharaman A, Leskovec Ju. Analiz bol'shih naborov dannyh. – M.: DMK Press, 2016. – 498 s.

Korobov M. Morphological analyzer and generator for Russian and Ukrainian languages // Analysis of Images, Social Networks and Texts: proceedings of International conference, 2015. – pp. 320-332


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162