Modification of the graph method for automatic abstraction tasks taking into account synonymy

Irina Polyakova, Igor Zaitsev

Abstract


The article discusses existing approaches to automatic text abstracting. The creation of an abstract obviously requires an understanding of the text at the level of pragmatics. However, at the moment, reliable semantic analysis is still not available for computers, especially not the analysis of pragmatics. Widespread methods based on neural networks can take into account semantics thanks to special vector representations of words. But the rest of the automatic referencing methods rely on morphology and syntax. However, part of the semantics is still available to them - there are thesauruses and networks, as well as algorithms that allow you to establish a semantic connection between individual words, such as their semantic similarity, in particular, synonymy.

 A method is proposed that takes into account the semantic similarity of words. A modification of the graph method has been developed that takes into account synonymy and allows for a better abstract. The basic and modified versions of the graph method are implemented programmatically for automatic referencing tasks, taking into account synonymy. The comparison of their quality in Russian and English texts using automatic evaluation metrics of abstracts is carried out. The evaluation results show an improvement in the modified graph method compared to the usual one, especially in Russian-language texts.


Full Text:

PDF (Russian)

References


Automatic abstracting and annotation [Electronic resource]. - URL: https://refdb.ru/look/1532518.html . (Accessed 10/16/2021)

Mehdi Allahyari, Seyedamin Pouriyeh, Saeid Safaei, and others Text Summarization Techniques: A Brief Survey [Electronic resource]. - URL: https://arxiv.org/pdf/1707.02268.pdf. (Accessed 15.12.2021)

Tian Shi, Yaser Keneshloo, Naren Ramakrishnan, Chandan K. Reddy. Neural Abstract Text Summarizationwith Sequence-to-Sequence Models [Electronic resource]. - URL: https://arxiv.org/pdf/1812.02303.pdf. (Accessed 20.07.2021)

Tomas Mikolov, Ilya Sutskever. Distributed Representations of Words and Phrasesand their Compositionality [Electronic resource]. - URL: https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf. (Accessed 20.11.2021)

George A. Miller, Richard Beckwith. Introduction to WordNet: An On-line Lexical Database [Electronic resource]. - URL: http://wordnetcode.princeton.edu/5papers.pdf. (Accessed 11.11.2021)

Dan Jurafsky. Word Meaning and Similarity [Electronic resource]. - URL: https://web.stanford.edu/class/cs124/lec/sem . (Accessed 21.03.2019)

Chin-Yew Lin. ROUGE: A Package for Automatic Evaluation of Summaries [Electronic resource]. - URL: https://www.aclweb.org/anthology/W04-1013. (Accessed 08.11.2020)

Rada Mihalcea, Paul Tarau. TextRank: Bringing Order into Texts - URL: https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf. (Accessed 10/22/2021)

Guneş Erkan, Dragomir R. Radev LexRank: Graph-based Lexical Centrality as Salience in Text Summarization [Electronic resource]. - URL: https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html. (Accessed 02.11.2021)

The PageRank Citation Ranking: Bringing Order to the Web [Electronic resource]. - URL: http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf. (Accessed 10.12.2021)

NLTK documentation [Electronic resource]. - URL: https://www.nltk.org . (Accessed 09.12.2020)

Morphological analyzer pymorphy2 [Electronic resource]. - URL: https://pymorphy2.readthedocs.io/en/latest / (Accessed 01.12.2021)

Networkx documentation [Electronic resource]. - URL: https://networkx.github.io/documentation/stable . (Accessed 09.09.2021)


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность IT Congress 2024

ISSN: 2307-8162